Cursor gets slower and slower

Question

Cursor gets slower and slower

dzigrino

SSC Veteran

Points: 264
More actions
July 11, 2007 at 8:52 am

#175220

Hi all,
I am writing a stored procedure to populate a table nightly with approximately 2,700,000 records. Each record has a ranking field and total field that must be populated (e.g. 5 out of 25). The rankings are based on groupings of the records. The groups are zip-sic where sic is a category used in the application. Last time I ran the procedure, it took 8 hrs to finish.
My issue is that when the procedure begins it is much faster. As the cursor steps through the zip-sic combinations it gets slower and slower. During the first minute 74,046 rows were inserted yielding an estimated 36 minutes to completion. With each passing minute this figure decreases thus increasing the time to completion. These are the figures I got for the first 4 minutes:
1- 74,046 (36 min)
2- 59,854 (40 min)
3- 42,197 (46 min)
4- 38,193 (50 min)
It took 57 minutes to insert 979,906 records making the estimated time to completion 157 minutes.
I thought that an index on the insertion table def_rankings, might be the cause if it was continually re-indexing. So I delete the index and then rebuild it at the end of the procedure. That helped but the times above are without the index. My thought is that I am using the temporary tables in the wrong way so that the memory never gets released as the cursor moves forward. Either that or maybe there is some other kind of statistic that is being recalculated with each insert that could be turned off until the cursor is finished. I have also tried getting rid of the cursor and adding an ID to the zip-sic results so that I could loop over them with a while-loop. This actually is slower than the cursor with the first minute completing only 22,878 insertions. And the while-loop still has the problem of slowing down as the procedure continues. I have also tried FORWARD_ONLY STATIC and FAST_FORWARD cursors. They both have the slowing problem.
I know enough to be dangerous, so there may be some obvious corrections in the way I have coded this. Any assistance would be most appreciated. Here is the procedure:
IF EXISTS (SELECT name FROM sysobjects
WHERE name = 'sp_CreateRankings2' AND type = 'P')
DROP PROCEDURE sp_CreateRankings2
GO
/* Stored procedure to create rankings */
CREATE PROC sp_CreateRankings2
AS
--CLEAR OLD RESULTS
truncate table def_rankings
--DELETE THE INDEX FOR FASTER INSERTIONS
DROP INDEX [dbo].[def_rankings].[Index_BE]
--CREATE A TEMP TABLE TO UNIQUELY HOLD THE CITY-STATE FOR EACH ZIP (chooses longest name for each city where duplicates exist)
select *
INTO #temp_map_city2zip
from map_city2zip c2z
where 1=1
and city = (select top 1 city from map_city2zip where zip = c2z.zip order by len(city) desc)
order by zip
--DECLARE MyCursor CURSOR FORWARD_ONLY STATIC FOR
DECLARE MyCursor CURSOR FAST_FORWARD FOR
-- Select all mapped city-zips crossed with all industries
SELECT distinct c2z.zip, zf.sic
FROM #temp_map_city2zip c2z INNER JOIN zipfindzips2 zfz on c2z.zip = zfz.zip
inner join zipfind zf on zfz.adjacentzip = zf.zip
and c2z.zip IN (SELECT zip FROM zipsupdated)
DECLARE @zip varchar(10), @sic int
OPEN MyCursor
FETCH NEXT FROM MyCursor INTO @zip, @sic
WHILE (@@FETCH_STATUS = 0)
BEGIN
--THE FIRST TEMP INSERT IS USED TO GET THE RIGHT ORDER
SELECT zfz.zip, be_no, sic,(points1 + 60-distance) as score, pp_bid, c.city, c.state, distance
INTO #TempRankings2
FROM zipfind z INNER JOIN zipfindzips2 zfz ON z.zip = zfz.adjacentzip
inner join #temp_map_city2zip c on c.zip = zfz.zip
WHERE distance <= 60
AND zfz.zip = @zip
AND z.sic = @sic
ORDER BY pp_bid desc, score desc, distance, be_no
--ONLY THEN CAN WE ASSIGN THE RIGHT RANK
SELECT zip, be_no, sic,score, pp_bid, city, state, distance, identity(int, 1, 1) as rank, 0 as total
INTO #TempRankings
FROM #TempRankings2
--ONLY THEN CAN WE INSERT THE TOTAL NUMBER
INSERT INTO def_rankings (zip, be_no, sic, score, pp_bid, city, state, distance, rank, total)
SELECT zip, be_no, sic, score, pp_bid, city, state, distance, rank, @@IDENTITY
FROM #TempRankings
DROP TABLE #TempRankings
DROP TABLE #TempRankings2
FETCH NEXT FROM MyCursor INTO @zip, @sic
END
DROP TABLE #temp_map_city2zip
--REBUILD THE INDEX
CREATE CLUSTERED INDEX [Index_BE] ON [dbo].[def_rankings]([be_no], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]
DEALLOCATE MyCursor
GO
Thanks again for any help you can offer.
Sincerely,
Danny

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

There is probably a set based solution to this process, but I don't see it yet. One change I think may help is in the declaration of the cursor. Try this and see if it produces the same results for your cursor:

DECLARE MyCursor CURSOR FAST_FORWARD FOR

-- Select all mapped city-zips crossed with all industries

SELECT DISTINCT

c2z.zip,

zf.sic

FROM

#temp_map_city2zip c2z

INNER JOIN zipfindzips2 zfz

on (c2z.zip = zfz.zip )

INNER JOIN zipfind zf

on (zfz.adjacentzip = zf.zip)

INNER JOIN zipsupdated zu

on (c2z.zip = zu.zip)

It eliminates the subquery and may help. All you can do is test, test, and test some more. I am sure others will look at this may provide other alternatives.

Morpheus-173846 Valued Member Points: 57 More actions · Answer 2

Try to run a profiler when running your procedure in order to see which part of your sp is resource consuming.

Hope this helps

By the way Group by is known to be resource consuming

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 3

Hi Danny,

Another thing that you could try is to created indexes on your temp tables this may help the query joins the tables start to get bigger and bigger.

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

Also, If you have to use a cursor (no one comes up with a set-based solution), you should try creating the the two # temp tables you use inside your while loop outside the loop and use insert into and truncate on the tables in the loop, then drop the # temp tables after the loop is done (or just let SQL Server drop them when the stored procedure finishes).

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 5

I would have to agree with Lynn on this one.

How may loops is your cursor doing?

It's not recomended to user cursors all the time as they have been know to lock up memory and records.

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

It is desireable to eliminate cursors if possible. Sometimes, however, they are the proper choice for a specific task. Not having your data handy to test against, it is difficult to know. As I look at your procedure I think I see a way to eliminate the cursor, but it may not work in your situation as it appears you may be using SQL Server 2000. If true, are you or your company looking at upgradingto SQL Server 2005 in the foreseable future?

dzigrino SSC Veteran Points: 264 More actions · Answer 7

Thanks! For the great posts. I am using SQL2000 and as far as I know there are no plans to update. The cursor currently loops over about 120,000 records, and that is bound to grow. My non-cursor version looks something like this excerpt:

SELECT distinct c2z.zip, zf.sic, identity(int, 1, 1) as ID

INTO #zipsic

FROM #temp_map_city2zip c2z INNER JOIN zipfindzips2 zfz on c2z.zip = zfz.zip

inner join zipfind zf on zfz.adjacentzip = zf.zip

and c2z.zip IN (SELECT zip FROM zipsupdated)

WHILE EXISTS (SELECT TOP 1 * FROM #zipsic)

BEGIN

SELECT TOP 1 @id = ID FROM #zipsic

--THE FIRST TEMP INSERT IS USED TO GET THE RIGHT ORDER

SELECT zfz.zip, be_no, zs.sic,(points1 + 60-distance) as score, pp_bid, c.city, c.state, distance

INTO #TempRankings2

FROM zipfind z INNER JOIN zipfindzips2 zfz ON z.zip = zfz.adjacentzip

inner join #temp_map_city2zip c on c.zip = zfz.zip

inner join #zipsic zs on zs.zip = c.zip and zs.sic = z.sic and zs.id = @id

WHERE distance <= 60

ORDER BY pp_bid desc, score desc, distance, be_no

--ONLY THEN CAN WE ASSIGN THE RIGHT RANK

SELECT zip, be_no, sic,score, pp_bid, city, state, distance, identity(int, 1, 1) as rank, @@ROWCOUNT as total

INTO #TempRankings

FROM #TempRankings2

--ONLY THEN CAN WE INSERT THE TOTAL NUMBER

INSERT INTO def_rankings (zip, be_no, sic, score, pp_bid, city, state, distance, rank, total)

SELECT zip, be_no, sic, score, pp_bid, city, state, distance, rank, total --@@IDENTITY

FROM #TempRankings

DROP TABLE #TempRankings

DROP TABLE #TempRankings2

DELETE FROM #zipsic WHERE ID = @id

END

Even though I am able to get rid of the cursor as shown above, I still have to loop over the records one at a time. I guess that does not qualify as a set-based treatment.

Morpheus, I haven't used the profiler before so I'll read up on how to do so. This sounds like it could be a big help in figuring out what is going on. Even though I conceptually do the rankings by group, I don't actually use the GROUP keyword in the procedure. But definitely good to know!

Lynn, I will take your improved select for a test run. I would love to hear if you think of a way to do this task set-wise. Also, I am not sure I can use INSERT INTO on both temp tables because I get an error saying IDENTITY() can only be used with a SELECT with an INTO clause. But I will use this hint wherever it lets me. Just need to look up how to declare a temp table before actually using it.

Christopher, I will also try indexes on the temp tables. But the only table that should be getting larger is the def_rankings table, and it is only used for INSERTS. Since the #temp_map_city2zip and the cursor do not change, and #temprankings and #temprankings2 are dropped each time through the cursor, I wonder if this will solve the slowing problem. No doubt that it will help shave time off the cursor overall, though.

Thanks again for the great hints! Let me know if there is some set-related solution that someone can envision.

Danny

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

You can define the identity column in the create table when creating the # temp tables, then you won't need the IDENTITY() function in the insert statement. This will also be reset when you truncate the # temp table at the end of each iteration of the loop.

dzigrino SSC Veteran Points: 264 More actions · Answer 9

dzigrino

SSC Veteran

Points: 264

July 11, 2007 at 10:12 am

#718699

Sweet! What a great idea Lynn!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Also, you may want to put an index on this # temp table: #temp_map_city2zip.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

One last change to suggest. In your procedure you are using @@IDENTITY, you may want to change that to SCOPE_IDENTITY(). If, during your processing an insert to another table with an identity field occurs you may get that value instead of the one you expect.

You may want to read up on these in BOL.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

Could you post the DDL for the tables used in the procedure and some sample data, perhapes enough for 3 quick iterations of representative data?

dzigrino SSC Veteran Points: 264 More actions · Answer 13

Hi Lynn, Sorry for the slow response. I am not sure how to get you the info you requested except to paste it into the body. I cannot figure out how to attach files. These tables are all large (even for 3 zips) so I will separate the data into different posts for each table. I just output csv files so that is what I am sending. If there is another way to get a scripted set of data, I am unfamiliar with how to do that.

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[map_city2zip]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

drop table [dbo].[map_city2zip]

GO

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ZipFindZips2]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

drop table [dbo].[ZipFindZips2]

GO

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ZipsUpdated]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

drop table [dbo].[ZipsUpdated]

GO

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ZipFind]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

drop table [dbo].[ZipFind]

GO

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[def_rankings]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

drop table [dbo].[def_rankings]

GO

CREATE TABLE [dbo].[map_city2zip] (

[city] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[state] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[zip] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[ZipFindZips2] (

[zip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[adjacentZip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[distance] [tinyint] NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[ZipsUpdated] (

[zip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[getdate] [smalldatetime] NOT NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[ZipFind] (

[BE_NO] [int] NOT NULL ,

[zip] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[points1] [float] NULL ,

[points2] [float] NULL ,

[points3] [float] NULL ,

[sic] [int] NOT NULL ,

[tid] [int] NOT NULL ,

[pp_bid] [money] NOT NULL ,

[dateCreated] [smalldatetime] NOT NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[def_rankings] (

[zip] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[be_no] [int] NOT NULL ,

[sic] [int] NOT NULL ,

[score] [int] NOT NULL ,

[pp_bid] [money] NULL ,

[city] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[state] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[distance] [int] NULL ,

[rank] [int] NULL ,

[total] [int] NULL ,

[dateCreated] [smalldatetime] NOT NULL

) ON [PRIMARY]

GO

ALTER TABLE [dbo].[map_city2zip] WITH NOCHECK ADD

CONSTRAINT [PK_map_city2zip] PRIMARY KEY CLUSTERED

(

[city],

[state]

) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

ALTER TABLE [dbo].[ZipFindZips2] WITH NOCHECK ADD

CONSTRAINT [PK_ZipFindZips2] PRIMARY KEY CLUSTERED

(

[zip],

[adjacentZip]

) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

ALTER TABLE [dbo].[ZipsUpdated] WITH NOCHECK ADD

CONSTRAINT [PK_ZipsUpdated] PRIMARY KEY CLUSTERED

(

[zip]

) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

ALTER TABLE [dbo].[ZipFind] WITH NOCHECK ADD

CONSTRAINT [PK_ZipFind] PRIMARY KEY CLUSTERED

(

[BE_NO],

[zip],

[sic],

[tid]

) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

ALTER TABLE [dbo].[ZipsUpdated] WITH NOCHECK ADD

CONSTRAINT [DF_ZipsUpdated_getdate] DEFAULT (getdate()) FOR [getdate]

GO

ALTER TABLE [dbo].[ZipFind] WITH NOCHECK ADD

CONSTRAINT [DF__zipFind__pp_bid__6E4C3B47] DEFAULT (0) FOR [pp_bid],

CONSTRAINT [DF__zipFind__dateCre__6F405F80] DEFAULT (getdate()) FOR [dateCreated]

GO

ALTER TABLE [dbo].[def_rankings] WITH NOCHECK ADD

CONSTRAINT [DF_danny_dateCreated] DEFAULT (getdate()) FOR [dateCreated]

GO

CREATE INDEX [Index_ZipFindZips2_zip] ON [dbo].[ZipFindZips2]([zip], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [Index_ZipFindZips2_adj_zip_distance] ON [dbo].[ZipFindZips2]([adjacentZip], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [Index_ZipFindZips2_all] ON [dbo].[ZipFindZips2]([zip], [adjacentZip], [distance]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [Index_ZipFindZip2_zip_only] ON [dbo].[ZipFindZips2]([zip]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

dzigrino SSC Veteran Points: 264 More actions · Answer 14

map_city2zip

==================

miami,FL,33101

miami beach,FL,33109

new york,NY,10001

new york city,NY,10001

ny,NY,10001

nyc,NY,10001