A terrible performance need to be improved

  • Hi folks,

    I have two tables with over 50 000 000 each. These two tables are updated everyday. They don't have no index.

    I need to do the following process every day:

    1. Create two new table from inner joining the two tables. The new table's size is roughly 200 000

    2. Then I create two cursors to analyze the two new tables.

    This process takes me 3 hours to go through one day records in a stored procedure. Sometimes it takes me half an hour to go through 3 days records.

    Do you have any good suggestion to improve the performance?

     

    Thanks for any hint.

     

    R.X.

  • >>They don't have no index.

    I think it's pretty obvious what you need to do.

    - Add index(es) to support the joins, and add a clustered index to prevent 2 large fragmented table heaps

    - Don't use cursors if there's an equivalent set-based operation

  • Sorry I have a typo in my posting: >>They don't have no index. should be

    >>They don't have index.

    What do you mean: "Don't use cursors if there's an equivalent set-based operation"? I have to fetch each row from the first cursor, and start the second cursor, go through it, close it, then fetch the second record from the first cursor, start the second cursor again,  and so on and so forth until all the records in the first cursor have been processed.

    When I was looking into the running, roughly half of the time is on generating two new tables, and the rest is for cursors.

     

    Thanks.

     

  • >>I have to fetch each row from the first cursor, and start the second cursor, go through it, close it,

    You don't have to. You get your head out of the procedural language loops within loops concept and use a SQL JOIN instead.

  • Could you state more detailed on your solution then?

  • Not possible without more details on the problem.

  • Thank you for your qucik response.

    What info are you looking for? I stated in my first post. Here it is the code:

            DELETE FROM dbo.[Latest Mail Received]

            INSERT INTO dbo.[Latest Mail Received] SELECT DISTINCT Server, MSGID, RDate FROM dbo.[table_received] WHERE [RDATE] >= @LastDate And [RDATE] < @NewDate

            DELETE FROM dbo.[Latest Mail Sent]

            INSERT INTO dbo.[Latest Mail Sent] SELECT ts.SERVER, ts.MSGID, ts.RDATE, ts.SENDER  FROM dbo.[table_sent] ts INNER JOIN

            dbo.[Latest Mail Received] lmr ON ts.Server = lmr.SERVER AND ts.Msgid = lmr.MSGID

            DECLARE received_cursor CURSOR FOR

            SELECT Server, MSGID, RDate FROM dbo.[Latest Mail Received]

           

            OPEN received_cursor

           FETCH NEXT FROM received_cursor

           INTO @server, @MsgID, @RDate

     

          WHILE @@FETCH_STATUS = 0

          BEGIN

                 SET @lFirstMsg = 1

                 DECLARE sent_cursor CURSOR FOR

                 SELECT Server, MSGID, RDate FROM dbo.[Latest Mail Sent]

                 WHERE Server = @server  AND MsgID = @MsgID

             

                 OPEN sent_cursor

      

                 FETCH NEXT FROM sent_cursor

                     INTO @SentServer, @SentMsgID, @SentRDate

                 IF @@FETCH_STATUS = 0  

                 BEGIN

                      WHILE @@FETCH_STATUS = 0

                      BEGIN

                      ...

                      END

                      FETCH NEXT FROM sent_cursor

                      INTO @SentServer, @SentMsgID, @SentRDate

                END

           END  

           CLOSE sent_cursor

           DEALLOCATE sent_cursor

           FETCH NEXT FROM received_cursor

           INTO @server, @MsgID, @RDate

      END

      CLOSE received_cursor

      DEALLOCATE received_cursor

     

  • Table [table_received] and [table_sent] has 50 000 000 rows each.

    The new created tabels: .[Latest Mail Received] and .[Latest Mail Sent] has 20 000 roughly for each.

  • >> DELETE FROM dbo.[Latest Mail Received]

    Use TRUNCATE TABLE instead of DELETE for improved performance, if you are emptying the entire table.

    The cursor appears to be a simple 2 table SQL JOIN:

    SELECT *

    FROM dbo.[Latest Mail Received] As R

    INNER JOIN dbo.[Latest Mail Sent] As S

      ON ( R.Server = S.Server AND

             R.MsgID = S.MsgID)

    What you do with the resultset is unknown, since all you provided was "..."

     

  • Thanks. Could you check your PM? I've send you the whole script.

  • Well, that's too much code to wade thru, but I doubt removing the cursors are an option unless you have the time & budget for a major re-write.

    Inside the cursor you have multiple code chunks like this:

    IF EXISTS(SELECT * FROM dbo.[Time Analysis]  WHERE SERVER = @server AND [DATE] = CONVERT(varchar(10),@RDate,101) AND [HOUR] = Cast(SubString(CONVERT(varchar(32),@RDate,114), 1, 2) as int))

    What size is the [Time Analysis] table ? By using functions like Convert() etc, you prevent SQL from using any indexes on that table.

    Also in many of your UPDATE statements, you use WHERE clauses with functions and again, you prevent SQL from using indexes.

    Also, if you're doing an IF EXISTS ... to determine whether or not to UPDATE or INSERT, and you're using the same WHERE clause in the IF EXISTS and the UPDATE, you're doing double the required querying.

    Just do the UPDATE and check @@RowCount - if @@Rowcount is zero, your UPDATE WHERE clause didn't find amatch so do the INSERT.

  • You may find a performance increase by changing the 1st INSERT INTO query to:

    SELECT Server, MSGID, RDate FROM dbo.[table_received] WHERE [RDATE] >= @LastDate And [RDATE] < @NewDate GROUP BY Server, MSGID, RDate

    Most of the time the GROUP BY is faster than DISTINCT.

    Andy

  • Hello,

    If you do multiple updates, it's very slow. Another way to do multiple updates is to use a temporary table and make a global update with an inner join.

    Could you send me the code via PM ?

  • Ok, what I would do is the following :

    - rather than proceed to multiple UPDATE statements, I would insert values in a temporary table for each row

    - at the end of the job, I would do a global UPDATE with an inner join clause

    This is dramaticaly faster !

    In my personal case, this method reduced by one hundred the processing time...

  • Hi Andy,

    I tested your GROUP option this morning, the original one takes 00:04:58 to do it, yours spents 00:06:18. Is there anything wrong with the GROUP BY?

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply