April 27, 2011 at 1:34 pm
begin Tran
declare @forward table
(rowId int identity(1,1),
ID int,
CHAR_1 Varchar(20),
CHAR_4 Varchar(20))
declare @rowId int,@maxRowId int,@ID int,
@CHAR_1 Varchar(20),
@CHAR_4 Varchar(20)
Insert into @forward(ID, CHAR_1,CHAR_4)
Select ID,CHAR_1,CHAR_4 from MESSAGE
where
CHAR_1 IN
(
SELECT CHAR_1
FROM MESSAGE
WHERE (CHAR_1 IS NOT NULL)
GROUP BY CHAR_1
HAVING (COUNT(*) > 1))
AND (CHAR_4 IS NULL)
select
@rowId = min(RowId),
@MaxRowId = max(RowId)
from @forward
while @RowId <= @MaxRowId begin
select
@ID=ID ,
@CHAR_1= CHAR_1,
@CHAR_4= CHAR_4
from
where
RowId = @RowId
----------------------------------------------------------------------
/* Delete only those duplicate records which are not present in MATTERS table
IF NOT EXISTS(SELECT * FROM MATTERS AS PM WHERE PM.ID=@ID)
BEGIN
/*Deleting the record from PROFILES table*/
DELETE FROM PROFILES WHERE PROFILES.ID=@ID
/*Deleting the record from PEOPLE_BUSINESS_UNITS table*/
DELETE FROM BUSINESS_UNITS WHERE BUSINESS_UNITS.ID=@ID
/* Deleting the record from PEOPLE_ORGANIZATIONS table*/
DELETE FROM ORGANIZATIONS WHERE ORGANIZATIONS.ID=@ID
DELETE FROM MESSAGE WHERE MESSAGE.ID=@ID
END
----------------------------------------------------------------------
set @RowId = @RowId + 1
end
I am using this query to delete duplicate records from "MESSAGE" table. I have 10,000 duplicate records. when i run this query it is taking 6 hours to delete those 10,000 duplicate records. I have to reduce the execution time.
How can i optimize this query?? How can i use JOINS in this query, joining Table variable with MATTERS Table. I am a bit confused here.
Please give your valuable suggestions to optimize this query.
Thanks in advance!!
April 27, 2011 at 2:08 pm
First, transactions put many locks, and chance that somebody is waiting on the lock is high. Your transactions should be as short as possible. BEGIN TRAN outside of the loop and commit (which you forgot, btw) at the end is not the shortest possible transaction. That means, relocate your BEGIN TRAN to just before first DELETE, and put you COMMIT just after last DELETE - inside the loop.
Install this[/url] script (sp_WhoIsActive) and see what do you wait on (lock? disk?).
Second, check for missing indexes in child tables: If you have any "child" tables referencing each table you want to delete from, they should have index on foreign key column(s), or you will be faced with full-table scan of all child tables, even if you delete just one row of parent table.
With sqlxdetails plugin you could check that tables and indexes in seconds (www.sqlxdetails.com).
With sql trace (profiler) select event "Showplan XML" and see are there any scans of large tables or indexes (yes, index scan is a bad thing! Index seek/lookup is good). Also include SP:StmtStarting event to see exactly which statement takes a long time.
Third, deleting on-by-one is a bad thing. There are much more efficient methods to delete many rows with a single DELETE command involving TOP N and OUTPUT clause, following @@ROWCOUNT check that ends the loop if there are no more rows. That is called "deleting in chunks". It would be better to delete all at once if you have a data warehouse, but if you are in the production OLTP, it is better to do that in chunks not to hold the locks too long because of other simultaneous users. But never on-by-one, that is the slowest method. There are laso better techniques to find duplicates. But, do that only if first two suggestions (which are really quick to check) do not solve your problem.
April 27, 2011 at 2:35 pm
Hi vedran.... Thanks for the reply.... how can i join the 'table variable' with the 'matters' table to delete duplicates. I am hoping that joins can solve this. Loading the duplicate records into the 'table variable' and deleting.
April 27, 2011 at 3:10 pm
I'd use a "slightly" different approach:
Instead of HAVING(COUNT(*)>0) I'd use a CTE and the ROW_NUMBER() OVER(PARTITION BY CHAR_1 ORDER BY <insert_preferred_column> DESC) AS Row approach and query for Row>1.
I'd insert the ID value into a staging temp table, adding an index on the ID column after the insert or before the insert (and control the ascending order during the insert).
Then I would use the staging table all at once:
DELETE FROM PROFILES
INNER JOIN Staging on PROFILES.ID=Staging.ID
/*Deleting the record from PEOPLE_BUSINESS_UNITS table*/
DELETE FROM BUSINESS_UNITS
INNER JOIN Staging on BUSINESS_UNITS.ID=Staging.ID
/* Deleting the record from PEOPLE_ORGANIZATIONS table*/
DELETE FROM ORGANIZATIONS
INNER JOIN Staging on ORGANIZATIONS.ID=Staging.ID
DELETE FROM MESSAGE
INNER JOIN Staging on MESSAGE .ID=Staging.ID
Should be a matter of seconds instead of hours (depending on the original table size and the indexes on those tables).
April 27, 2011 at 3:36 pm
Also to help you discover where most of the recessing time is being undertaken you could do the following in SSMS :
Set statistics io on
Set statistics time on
Run query
Set statistics io off
Set statistics time off
If you are able to let it run and add the actual execution plan this will allow you to identify full table scans etc which are hindering your performance.
If not you could use the estimated plan but this will not necessarily be the plan being used.
MCT
MCITP Database Admin 2008
MCITP Database Admin 2008
MCITP Database Dev 2008
www.jnrit.com.au/Blog.aspx
April 27, 2011 at 4:27 pm
tinnusqlserver (4/27/2011)
how can i join the 'table variable' with the 'matters' table to delete duplicates
I would rather use a temp table than a table variable, because table variable does not have statistics and optimizer assumes it has only one row and can find suboptimal (slow) execution plan for the sql command.
LutzM showed you a very nice method to join in DELETE command. Another syntax which I prefer for DELETE and UPDATE command with multiple joins over
DELETE FROM PROFILES
INNER JOIN Staging on PROFILES.ID=Staging.ID
is
DELETE p -- you can replace that with "SELECT *" if you want to see the rows
FROM PROFILES p
JOIN Staging s on p.ID = s.ID
April 27, 2011 at 4:39 pm
Vedran Kesegic (4/27/2011)
tinnusqlserver (4/27/2011)
how can i join the 'table variable' with the 'matters' table to delete duplicatesI would rather use a temp table than a table variable, because table variable does not have statistics and optimizer assumes it has only one row and can find suboptimal (slow) execution plan for the sql command.
LutzM showed you a very nice method to join in DELETE command. Another syntax which I prefer for DELETE and UPDATE command with multiple joins over
DELETE FROM PROFILES
INNER JOIN Staging on PROFILES.ID=Staging.ID
is
DELETE p -- you can replace that with "SELECT *" if you want to see the rows
FROM PROFILES p
JOIN Staging s on p.ID = s.ID
Yup. Good point, Vedran. I actually just copied and modified the original statement without thinking about the syntax in general. But you're definitely right.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply