January 11, 2006 at 10:46 am
Hi folks,
I have two tables with over 50 000 000 each. These two tables are updated everyday. They don't have no index.
I need to do the following process every day:
1. Create two new table from inner joining the two tables. The new table's size is roughly 200 000
2. Then I create two cursors to analyze the two new tables.
This process takes me 3 hours to go through one day records in a stored procedure. Sometimes it takes me half an hour to go through 3 days records.
Do you have any good suggestion to improve the performance?
Thanks for any hint.
R.X.
January 11, 2006 at 11:21 am
>>They don't have no index.
I think it's pretty obvious what you need to do.
- Add index(es) to support the joins, and add a clustered index to prevent 2 large fragmented table heaps
- Don't use cursors if there's an equivalent set-based operation
January 11, 2006 at 12:05 pm
Sorry I have a typo in my posting: >>They don't have no index. should be
>>They don't have index.
What do you mean: "Don't use cursors if there's an equivalent set-based operation"? I have to fetch each row from the first cursor, and start the second cursor, go through it, close it, then fetch the second record from the first cursor, start the second cursor again, and so on and so forth until all the records in the first cursor have been processed.
When I was looking into the running, roughly half of the time is on generating two new tables, and the rest is for cursors.
Thanks.
January 11, 2006 at 12:11 pm
>>I have to fetch each row from the first cursor, and start the second cursor, go through it, close it,
You don't have to. You get your head out of the procedural language loops within loops concept and use a SQL JOIN instead.
January 11, 2006 at 12:40 pm
Could you state more detailed on your solution then?
January 11, 2006 at 12:50 pm
Not possible without more details on the problem.
January 11, 2006 at 1:03 pm
Thank you for your qucik response.
What info are you looking for? I stated in my first post. Here it is the code:
DELETE FROM dbo.[Latest Mail Received]
INSERT INTO dbo.[Latest Mail Received] SELECT DISTINCT Server, MSGID, RDate FROM dbo.[table_received] WHERE [RDATE] >= @LastDate And [RDATE] < @NewDate
DELETE FROM dbo.[Latest Mail Sent]
INSERT INTO dbo.[Latest Mail Sent] SELECT ts.SERVER, ts.MSGID, ts.RDATE, ts.SENDER FROM dbo.[table_sent] ts INNER JOIN
dbo.[Latest Mail Received] lmr ON ts.Server = lmr.SERVER AND ts.Msgid = lmr.MSGID
DECLARE received_cursor CURSOR FOR
SELECT Server, MSGID, RDate FROM dbo.[Latest Mail Received]
OPEN received_cursor
FETCH NEXT FROM received_cursor
INTO @server, @MsgID, @RDate
WHILE @@FETCH_STATUS = 0
BEGIN
SET @lFirstMsg = 1
DECLARE sent_cursor CURSOR FOR
SELECT Server, MSGID, RDate FROM dbo.[Latest Mail Sent]
WHERE Server = @server AND MsgID = @MsgID
OPEN sent_cursor
FETCH NEXT FROM sent_cursor
INTO @SentServer, @SentMsgID, @SentRDate
IF @@FETCH_STATUS = 0
BEGIN
WHILE @@FETCH_STATUS = 0
BEGIN
...
END
FETCH NEXT FROM sent_cursor
INTO @SentServer, @SentMsgID, @SentRDate
END
END
CLOSE sent_cursor
DEALLOCATE sent_cursor
FETCH NEXT FROM received_cursor
INTO @server, @MsgID, @RDate
END
CLOSE received_cursor
DEALLOCATE received_cursor
January 11, 2006 at 1:08 pm
Table [table_received] and [table_sent] has 50 000 000 rows each.
The new created tabels: .[Latest Mail Received] and .[Latest Mail Sent] has 20 000 roughly for each.
January 11, 2006 at 1:09 pm
>> DELETE FROM dbo.[Latest Mail Received]
Use TRUNCATE TABLE instead of DELETE for improved performance, if you are emptying the entire table.
The cursor appears to be a simple 2 table SQL JOIN:
SELECT *
FROM dbo.[Latest Mail Received] As R
INNER JOIN dbo.[Latest Mail Sent] As S
ON ( R.Server = S.Server AND
R.MsgID = S.MsgID)
What you do with the resultset is unknown, since all you provided was "..."
January 11, 2006 at 1:23 pm
Thanks. Could you check your PM? I've send you the whole script.
January 11, 2006 at 3:07 pm
Well, that's too much code to wade thru, but I doubt removing the cursors are an option unless you have the time & budget for a major re-write.
Inside the cursor you have multiple code chunks like this:
IF EXISTS(SELECT * FROM dbo.[Time Analysis] WHERE SERVER = @server AND [DATE] = CONVERT(varchar(10),@RDate,101) AND [HOUR] = Cast(SubString(CONVERT(varchar(32),@RDate,114), 1, 2) as int))
What size is the [Time Analysis] table ? By using functions like Convert() etc, you prevent SQL from using any indexes on that table.
Also in many of your UPDATE statements, you use WHERE clauses with functions and again, you prevent SQL from using indexes.
Also, if you're doing an IF EXISTS ... to determine whether or not to UPDATE or INSERT, and you're using the same WHERE clause in the IF EXISTS and the UPDATE, you're doing double the required querying.
Just do the UPDATE and check @@RowCount - if @@Rowcount is zero, your UPDATE WHERE clause didn't find amatch so do the INSERT.
January 12, 2006 at 1:04 am
You may find a performance increase by changing the 1st INSERT INTO query to:
SELECT Server, MSGID, RDate FROM dbo.[table_received] WHERE [RDATE] >= @LastDate And [RDATE] < @NewDate GROUP BY Server, MSGID, RDate
Most of the time the GROUP BY is faster than DISTINCT.
Andy
January 12, 2006 at 6:17 am
Hello,
If you do multiple updates, it's very slow. Another way to do multiple updates is to use a temporary table and make a global update with an inner join.
Could you send me the code via PM ?
January 12, 2006 at 8:55 am
Ok, what I would do is the following :
- rather than proceed to multiple UPDATE statements, I would insert values in a temporary table for each row
- at the end of the job, I would do a global UPDATE with an inner join clause
This is dramaticaly faster !
In my personal case, this method reduced by one hundred the processing time...
January 12, 2006 at 9:00 am
Hi Andy,
I tested your GROUP option this morning, the original one takes 00:04:58 to do it, yours spents 00:06:18. Is there anything wrong with the GROUP BY?
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply