A terrible performance need to be improved

Question

Post reply

A terrible performance need to be improved

halifaxdal

SSCoach

Points: 19741
More actions
January 11, 2006 at 10:46 am

#152698

Hi folks,
I have two tables with over 50 000 000 each. These two tables are updated everyday. They don't have no index.
I need to do the following process every day:
1. Create two new table from inner joining the two tables. The new table's size is roughly 200 000
2. Then I create two cursors to analyze the two new tables.
This process takes me 3 hours to go through one day records in a stored procedure. Sometimes it takes me half an hour to go through 3 days records.
Do you have any good suggestion to improve the performance?

Thanks for any hint.

R.X.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

>>They don't have no index.

I think it's pretty obvious what you need to do.

- Add index(es) to support the joins, and add a clustered index to prevent 2 large fragmented table heaps

- Don't use cursors if there's an equivalent set-based operation

halifaxdal SSCoach Points: 19741 More actions · Answer 2

Sorry I have a typo in my posting: >>They don't have no index. should be

>>They don't have index.

What do you mean: "Don't use cursors if there's an equivalent set-based operation"? I have to fetch each row from the first cursor, and start the second cursor, go through it, close it, then fetch the second record from the first cursor, start the second cursor again, and so on and so forth until all the records in the first cursor have been processed.

When I was looking into the running, roughly half of the time is on generating two new tables, and the rest is for cursors.

Thanks.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 3

>>I have to fetch each row from the first cursor, and start the second cursor, go through it, close it,

You don't have to. You get your head out of the procedural language loops within loops concept and use a SQL JOIN instead.

halifaxdal SSCoach Points: 19741 More actions · Answer 4

Could you state more detailed on your solution then?

PW-201837 SSC-Insane Points: 20805 More actions · Answer 5

PW-201837

SSC-Insane

Points: 20805

January 11, 2006 at 12:50 pm

#614172

Not possible without more details on the problem.

halifaxdal SSCoach Points: 19741 More actions · Answer 6

Thank you for your qucik response.

What info are you looking for? I stated in my first post. Here it is the code:

DELETE FROM dbo.[Latest Mail Received]

INSERT INTO dbo.[Latest Mail Received] SELECT DISTINCT Server, MSGID, RDate FROM dbo.[table_received] WHERE [RDATE] >= @LastDate And [RDATE] < @NewDate

DELETE FROM dbo.[Latest Mail Sent]

INSERT INTO dbo.[Latest Mail Sent] SELECT ts.SERVER, ts.MSGID, ts.RDATE, ts.SENDER FROM dbo.[table_sent] ts INNER JOIN

dbo.[Latest Mail Received] lmr ON ts.Server = lmr.SERVER AND ts.Msgid = lmr.MSGID

DECLARE received_cursor CURSOR FOR

SELECT Server, MSGID, RDate FROM dbo.[Latest Mail Received]

OPEN received_cursor

FETCH NEXT FROM received_cursor

INTO @server, @MsgID, @RDate

WHILE @@FETCH_STATUS = 0

BEGIN

SET @lFirstMsg = 1

DECLARE sent_cursor CURSOR FOR

SELECT Server, MSGID, RDate FROM dbo.[Latest Mail Sent]

WHERE Server = @server AND MsgID = @MsgID

OPEN sent_cursor

FETCH NEXT FROM sent_cursor

INTO @SentServer, @SentMsgID, @SentRDate

IF @@FETCH_STATUS = 0

BEGIN

WHILE @@FETCH_STATUS = 0

BEGIN

...

END

FETCH NEXT FROM sent_cursor

INTO @SentServer, @SentMsgID, @SentRDate

END

CLOSE sent_cursor

DEALLOCATE sent_cursor

FETCH NEXT FROM received_cursor

INTO @server, @MsgID, @RDate

END

CLOSE received_cursor

DEALLOCATE received_cursor

halifaxdal SSCoach Points: 19741 More actions · Answer 7

Table [table_received] and [table_sent] has 50 000 000 rows each.

The new created tabels: .[Latest Mail Received] and .[Latest Mail Sent] has 20 000 roughly for each.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 8

>> DELETE FROM dbo.[Latest Mail Received]

Use TRUNCATE TABLE instead of DELETE for improved performance, if you are emptying the entire table.

The cursor appears to be a simple 2 table SQL JOIN:

SELECT *

FROM dbo.[Latest Mail Received] As R

INNER JOIN dbo.[Latest Mail Sent] As S

ON ( R.Server = S.Server AND

R.MsgID = S.MsgID)

What you do with the resultset is unknown, since all you provided was "..."

halifaxdal SSCoach Points: 19741 More actions · Answer 9

Thanks. Could you check your PM? I've send you the whole script.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 10

Well, that's too much code to wade thru, but I doubt removing the cursors are an option unless you have the time & budget for a major re-write.

Inside the cursor you have multiple code chunks like this:

IF EXISTS(SELECT * FROM dbo.[Time Analysis] WHERE SERVER = @server AND [DATE] = CONVERT(varchar(10),@RDate,101) AND [HOUR] = Cast(SubString(CONVERT(varchar(32),@RDate,114), 1, 2) as int))

What size is the [Time Analysis] table ? By using functions like Convert() etc, you prevent SQL from using any indexes on that table.

Also in many of your UPDATE statements, you use WHERE clauses with functions and again, you prevent SQL from using indexes.

Also, if you're doing an IF EXISTS ... to determine whether or not to UPDATE or INSERT, and you're using the same WHERE clause in the IF EXISTS and the UPDATE, you're doing double the required querying.

Just do the UPDATE and check @@RowCount - if @@Rowcount is zero, your UPDATE WHERE clause didn't find amatch so do the INSERT.

David A. Long SSCrazy Eights Points: 8421 More actions · Answer 11

You may find a performance increase by changing the 1st INSERT INTO query to:

SELECT Server, MSGID, RDate FROM dbo.[table_received] WHERE [RDATE] >= @LastDate And [RDATE] < @NewDate GROUP BY Server, MSGID, RDate

Most of the time the GROUP BY is faster than DISTINCT.

Andy

thierry Old Hand Points: 301 More actions · Answer 12

Hello,

If you do multiple updates, it's very slow. Another way to do multiple updates is to use a temporary table and make a global update with an inner join.

Could you send me the code via PM ?

thierry Old Hand Points: 301 More actions · Answer 13

Ok, what I would do is the following :

- rather than proceed to multiple UPDATE statements, I would insert values in a temporary table for each row

- at the end of the job, I would do a global UPDATE with an inner join clause

This is dramaticaly faster !

In my personal case, this method reduced by one hundred the processing time...

halifaxdal SSCoach Points: 19741 More actions · Answer 14

Hi Andy,

I tested your GROUP option this morning, the original one takes 00:04:58 to do it, yours spents 00:06:18. Is there anything wrong with the GROUP BY?