Cursor gets slower and slower

  • Hi all,

    I am writing a stored procedure to populate a table nightly with approximately 2,700,000 records. Each record has a ranking field and total field that must be populated (e.g. 5 out of 25). The rankings are based on groupings of the records. The groups are zip-sic where sic is a category used in the application. Last time I ran the procedure, it took 8 hrs to finish.

    My issue is that when the procedure begins it is much faster. As the cursor steps through the zip-sic combinations it gets slower and slower. During the first minute 74,046 rows were inserted yielding an estimated 36 minutes to completion. With each passing minute this figure decreases thus increasing the time to completion. These are the figures I got for the first 4 minutes:

    1- 74,046 (36 min)

    2- 59,854 (40 min)

    3- 42,197 (46 min)

    4- 38,193 (50 min)

    It took 57 minutes to insert 979,906 records making the estimated time to completion 157 minutes.

    I thought that an index on the insertion table def_rankings, might be the cause if it was continually re-indexing. So I delete the index and then rebuild it at the end of the procedure. That helped but the times above are without the index. My thought is that I am using the temporary tables in the wrong way so that the memory never gets released as the cursor moves forward. Either that or maybe there is some other kind of statistic that is being recalculated with each insert that could be turned off until the cursor is finished. I have also tried getting rid of the cursor and adding an ID to the zip-sic results so that I could loop over them with a while-loop. This actually is slower than the cursor with the first minute completing only 22,878 insertions. And the while-loop still has the problem of slowing down as the procedure continues. I have also tried FORWARD_ONLY STATIC and FAST_FORWARD cursors. They both have the slowing problem.

    I know enough to be dangerous, so there may be some obvious corrections in the way I have coded this. Any assistance would be most appreciated. Here is the procedure:

    IF EXISTS (SELECT name FROM sysobjects

    WHERE name = 'sp_CreateRankings2' AND type = 'P')

    DROP PROCEDURE sp_CreateRankings2

    GO

    /* Stored procedure to create rankings */

    CREATE PROC sp_CreateRankings2

    AS

    --CLEAR OLD RESULTS

    truncate table def_rankings

    --DELETE THE INDEX FOR FASTER INSERTIONS

    DROP INDEX [dbo].[def_rankings].[Index_BE]

    --CREATE A TEMP TABLE TO UNIQUELY HOLD THE CITY-STATE FOR EACH ZIP (chooses longest name for each city where duplicates exist)

    select *

    INTO #temp_map_city2zip

    from map_city2zip c2z

    where 1=1

    and city = (select top 1 city from map_city2zip where zip = c2z.zip order by len(city) desc)

    order by zip

    --DECLARE MyCursor CURSOR FORWARD_ONLY STATIC FOR

    DECLARE MyCursor CURSOR FAST_FORWARD FOR

    -- Select all mapped city-zips crossed with all industries

    SELECT distinct c2z.zip, zf.sic

    FROM #temp_map_city2zip c2z INNER JOIN zipfindzips2 zfz on c2z.zip = zfz.zip

    inner join zipfind zf on zfz.adjacentzip = zf.zip

    and c2z.zip IN (SELECT zip FROM zipsupdated)

    DECLARE @zip varchar(10), @sic int

    OPEN MyCursor

    FETCH NEXT FROM MyCursor INTO @zip, @sic

    WHILE (@@FETCH_STATUS = 0)

    BEGIN

    --THE FIRST TEMP INSERT IS USED TO GET THE RIGHT ORDER

    SELECT zfz.zip, be_no, sic,(points1 + 60-distance) as score, pp_bid, c.city, c.state, distance

    INTO #TempRankings2

    FROM zipfind z INNER JOIN zipfindzips2 zfz ON z.zip = zfz.adjacentzip

    inner join #temp_map_city2zip c on c.zip = zfz.zip

    WHERE distance <= 60
    AND zfz.zip = @zip
    AND z.sic = @sic
    ORDER BY pp_bid desc, score desc, distance, be_no

    --ONLY THEN CAN WE ASSIGN THE RIGHT RANK
    SELECT zip, be_no, sic,score, pp_bid, city, state, distance, identity(int, 1, 1) as rank, 0 as total
    INTO #TempRankings
    FROM #TempRankings2

    --ONLY THEN CAN WE INSERT THE TOTAL NUMBER
    INSERT INTO def_rankings (zip, be_no, sic, score, pp_bid, city, state, distance, rank, total)
    SELECT zip, be_no, sic, score, pp_bid, city, state, distance, rank, @@IDENTITY
    FROM #TempRankings

    DROP TABLE #TempRankings
    DROP TABLE #TempRankings2

    FETCH NEXT FROM MyCursor INTO @zip, @sic
    END

    DROP TABLE #temp_map_city2zip

    --REBUILD THE INDEX
    CREATE CLUSTERED INDEX [Index_BE] ON [dbo].[def_rankings]([be_no], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]

    DEALLOCATE MyCursor
    GO

    Thanks again for any help you can offer.

    Sincerely,
    Danny

  • There is probably a set based solution to this process, but I don't see it yet.  One change I think may help is in the declaration of the cursor.  Try this and see if it produces the same results for your cursor:

    DECLARE MyCursor CURSOR FAST_FORWARD FOR

    -- Select all mapped city-zips crossed with all industries

    SELECT DISTINCT

        c2z.zip,

        zf.sic

    FROM

        #temp_map_city2zip c2z

        INNER JOIN zipfindzips2 zfz

            on (c2z.zip = zfz.zip )

        INNER JOIN zipfind zf

            on (zfz.adjacentzip = zf.zip)

        INNER JOIN zipsupdated zu

            on (c2z.zip = zu.zip)

    It eliminates the subquery and may help.  All you can do is test, test, and test some more.  I am sure others will look at this may provide other alternatives.

  • Try to run a profiler when running your procedure in order to see which part of your sp is resource consuming.

    Hope this helps

    By the way Group by is known to be resource consuming

  • Hi Danny,

     

    Another thing that you could try is to created indexes on your temp tables this may help the query joins the tables start to get bigger and bigger.

     

     

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Also,  If you have to use a cursor (no one comes up with a set-based solution), you should try creating the the two # temp tables you use inside your while loop outside the loop and use insert into and truncate on the tables in the loop, then drop the # temp tables after the loop is done (or just let SQL Server drop them when the stored procedure finishes).

  • I would have to agree with Lynn on this one.

    How may loops is your cursor doing?

    It's not recomended to user cursors all the time as they have been know to lock up memory and records.

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • It is desireable to eliminate cursors if possible.  Sometimes, however, they are the proper choice for a specific task.  Not having your data handy to test against, it is difficult to know.  As I look at your procedure I think I see a way to eliminate the cursor, but it may not work in your situation as it appears you may be using SQL Server 2000.  If true, are you or your company looking at upgradingto SQL Server 2005 in the foreseable future?

  • Thanks! For the great posts. I am using SQL2000 and as far as I know there are no plans to update. The cursor currently loops over about 120,000 records, and that is bound to grow. My non-cursor version looks something like this excerpt:

    SELECT distinct c2z.zip, zf.sic, identity(int, 1, 1) as ID

    INTO #zipsic

    FROM #temp_map_city2zip c2z INNER JOIN zipfindzips2 zfz on c2z.zip = zfz.zip

    inner join zipfind zf on zfz.adjacentzip = zf.zip

    and c2z.zip IN (SELECT zip FROM zipsupdated)

    WHILE EXISTS (SELECT TOP 1 * FROM #zipsic)

    BEGIN

    SELECT TOP 1 @id = ID FROM #zipsic

    --THE FIRST TEMP INSERT IS USED TO GET THE RIGHT ORDER

    SELECT zfz.zip, be_no, zs.sic,(points1 + 60-distance) as score, pp_bid, c.city, c.state, distance

    INTO #TempRankings2

    FROM zipfind z INNER JOIN zipfindzips2 zfz ON z.zip = zfz.adjacentzip

    inner join #temp_map_city2zip c on c.zip = zfz.zip

    inner join #zipsic zs on zs.zip = c.zip and zs.sic = z.sic and zs.id = @id

    WHERE distance <= 60

    ORDER BY pp_bid desc, score desc, distance, be_no

    --ONLY THEN CAN WE ASSIGN THE RIGHT RANK

    SELECT zip, be_no, sic,score, pp_bid, city, state, distance, identity(int, 1, 1) as rank, @@ROWCOUNT as total

    INTO #TempRankings

    FROM #TempRankings2

    --ONLY THEN CAN WE INSERT THE TOTAL NUMBER

    INSERT INTO def_rankings (zip, be_no, sic, score, pp_bid, city, state, distance, rank, total)

    SELECT zip, be_no, sic, score, pp_bid, city, state, distance, rank, total --@@IDENTITY

    FROM #TempRankings

    DROP TABLE #TempRankings

    DROP TABLE #TempRankings2

    DELETE FROM #zipsic WHERE ID = @id

    END

    Even though I am able to get rid of the cursor as shown above, I still have to loop over the records one at a time. I guess that does not qualify as a set-based treatment.

    Morpheus, I haven't used the profiler before so I'll read up on how to do so. This sounds like it could be a big help in figuring out what is going on. Even though I conceptually do the rankings by group, I don't actually use the GROUP keyword in the procedure. But definitely good to know!

    Lynn, I will take your improved select for a test run. I would love to hear if you think of a way to do this task set-wise. Also, I am not sure I can use INSERT INTO on both temp tables because I get an error saying IDENTITY() can only be used with a SELECT with an INTO clause. But I will use this hint wherever it lets me. Just need to look up how to declare a temp table before actually using it.

    Christopher, I will also try indexes on the temp tables. But the only table that should be getting larger is the def_rankings table, and it is only used for INSERTS. Since the #temp_map_city2zip and the cursor do not change, and #temprankings and #temprankings2 are dropped each time through the cursor, I wonder if this will solve the slowing problem. No doubt that it will help shave time off the cursor overall, though.

    Thanks again for the great hints! Let me know if there is some set-related solution that someone can envision.

    Danny

  • You can define the identity column in the create table when creating the # temp tables, then you won't need the IDENTITY() function in the insert statement.  This will also be reset when you truncate the # temp table at the end of each iteration of the loop.

     

  • Sweet! What a great idea Lynn!

  • Also, you may want to put an index on this # temp table: #temp_map_city2zip.

  • One last change to suggest.  In your procedure you are using @@IDENTITY, you may want to change that to SCOPE_IDENTITY().  If, during your processing an insert to another table with an identity field occurs you may get that value instead of the one you expect.

    You may want to read up on these in BOL.

  • Could you post the DDL for the tables used in the procedure and some sample data, perhapes enough for 3 quick iterations of representative data?

  • Hi Lynn, Sorry for the slow response. I am not sure how to get you the info you requested except to paste it into the body. I cannot figure out how to attach files. These tables are all large (even for 3 zips) so I will separate the data into different posts for each table. I just output csv files so that is what I am sending. If there is another way to get a scripted set of data, I am unfamiliar with how to do that.

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[map_city2zip]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

    drop table [dbo].[map_city2zip]

    GO

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ZipFindZips2]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

    drop table [dbo].[ZipFindZips2]

    GO

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ZipsUpdated]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

    drop table [dbo].[ZipsUpdated]

    GO

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ZipFind]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

    drop table [dbo].[ZipFind]

    GO

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[def_rankings]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

    drop table [dbo].[def_rankings]

    GO

    CREATE TABLE [dbo].[map_city2zip] (

    [city] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [state] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [zip] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[ZipFindZips2] (

    [zip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [adjacentZip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [distance] [tinyint] NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[ZipsUpdated] (

    [zip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [getdate] [smalldatetime] NOT NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[ZipFind] (

    [BE_NO] [int] NOT NULL ,

    [zip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [points1] [float] NULL ,

    [points2] [float] NULL ,

    [points3] [float] NULL ,

    [sic] [int] NOT NULL ,

    [tid] [int] NOT NULL ,

    [pp_bid] [money] NOT NULL ,

    [dateCreated] [smalldatetime] NOT NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[def_rankings] (

    [zip] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [be_no] [int] NOT NULL ,

    [sic] [int] NOT NULL ,

    [score] [int] NOT NULL ,

    [pp_bid] [money] NULL ,

    [city] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [state] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [distance] [int] NULL ,

    [rank] [int] NULL ,

    [total] [int] NULL ,

    [dateCreated] [smalldatetime] NOT NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[map_city2zip] WITH NOCHECK ADD

    CONSTRAINT [PK_map_city2zip] PRIMARY KEY CLUSTERED

    (

    [city],

    [state]

    ) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[ZipFindZips2] WITH NOCHECK ADD

    CONSTRAINT [PK_ZipFindZips2] PRIMARY KEY CLUSTERED

    (

    [zip],

    [adjacentZip]

    ) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[ZipsUpdated] WITH NOCHECK ADD

    CONSTRAINT [PK_ZipsUpdated] PRIMARY KEY CLUSTERED

    (

    [zip]

    ) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[ZipFind] WITH NOCHECK ADD

    CONSTRAINT [PK_ZipFind] PRIMARY KEY CLUSTERED

    (

    [BE_NO],

    [zip],

    [sic],

    [tid]

    ) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[ZipsUpdated] WITH NOCHECK ADD

    CONSTRAINT [DF_ZipsUpdated_getdate] DEFAULT (getdate()) FOR [getdate]

    GO

    ALTER TABLE [dbo].[ZipFind] WITH NOCHECK ADD

    CONSTRAINT [DF__zipFind__pp_bid__6E4C3B47] DEFAULT (0) FOR [pp_bid],

    CONSTRAINT [DF__zipFind__dateCre__6F405F80] DEFAULT (getdate()) FOR [dateCreated]

    GO

    ALTER TABLE [dbo].[def_rankings] WITH NOCHECK ADD

    CONSTRAINT [DF_danny_dateCreated] DEFAULT (getdate()) FOR [dateCreated]

    GO

    CREATE INDEX [Index_ZipFindZips2_zip] ON [dbo].[ZipFindZips2]([zip], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [Index_ZipFindZips2_adj_zip_distance] ON [dbo].[ZipFindZips2]([adjacentZip], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [Index_ZipFindZips2_all] ON [dbo].[ZipFindZips2]([zip], [adjacentZip], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE INDEX [Index_ZipFindZip2_zip_only] ON [dbo].[ZipFindZips2]([zip]) WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

  • map_city2zip

    ==================

    miami,FL,33101

    miami beach,FL,33109

    new york,NY,10001

    new york city,NY,10001

    ny,NY,10001

    nyc,NY,10001

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply