May 22, 2016 at 1:17 pm
I am working on a statement that deletes records from one table where 30 fields match in value with at least one record in another table. My initial statement was:
delete <tablename> a
from <secondtablename> b
where a.field1 = b.field1
and a.field2 = b.field2
....
....
and a.field30 = b.field30
I need something more efficient, so I then tried the following, but the performance was even worse:
delete <tablename> a
where exists
(
select a.field1, a.field2.........a.field30
intersect
select b.field1, b.field2.........b.field30
from <secondtablename> b
)
The structure of the tables cannot be changed. Can someone please suggest a more efficient method to accomplish this task?
May 22, 2016 at 8:22 pm
WHY is it inefficient? Did you look at the query plan? How many records? What metrics are you using to determine inefficient?
If you are doing only a few rows out of many and you aren't seeking on an index it will hurt due to the table scan.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 24, 2016 at 6:21 am
A big part of this will be down to the structure of your table and the indexes in place. A delete based on thirty separate columns is going to be tough just because thirty columns aren't going to index terribly well. However, surely there are more important columns than others, the columns that most accurately identify the data being deleted. If those are always used, then they're the ones on which you can place an index that could speed this up.
However, without seeing the full query or the execution plan, these vague suggestions are the best that can be made.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 24, 2016 at 6:26 am
jonathanmarenus (5/22/2016)
I need something more efficient, so I then tried the following, but the performance was even worse:delete <tablename> a
where exists
(
select a.field1, a.field2.........a.field30
intersect
select b.field1, b.field2.........b.field30
from <secondtablename> b
)
That's a weird way of doing an exists, it's requiring an extra scan over table a in return for less typing (maybe less typing).
DELETE TableA
WHERE EXISTS (SELECT 1 FROM TableB WHERE TableA.Field1=TableB.Field1 AND TableA.Field2 = TableB.Field2 AND ...)
It's not going to be pretty though. Is this some archiving or background process?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 25, 2016 at 1:35 am
What about adding a new derived column and index to each table that creates a hash across the 30 fields. Then do a compare of the hashes.
ALTER TABLE <tablename>
ADD HashVal AS HASHBYTES('MD5'
, ISNULL(Field1, '')
+ ',' + ISNULL(Field2, '')
+ ',' + ...
+ ',' + ISNULL(Field30, '')
) PERSISTED NOT NULL;
create nonclustered index ix_tablename_DeleteHash on <tablename> (HashVal[, PK]);
ALTER TABLE <secondtablename>
ADD HashVal AS HASHBYTES('MD5'
, ISNULL(Field1, '')
+ ',' + ISNULL(Field2, '')
+ ',' + ...
+ ',' + ISNULL(Field30, '')
) PERSISTED NOT NULL
create nonclustered index ix_secondtablename_DeleteHash on <secondtablename> (HashVal[, PK]);
Then do a delete based on the hash.
delete a
from <tablename> a
join <secondtablename> b
on a.HashVal = b.HashVal [and a.PK = b.PK]
May 25, 2016 at 8:00 am
If you go the HASHBYTES method (which I too have used in the past for large-scale data loading processes), consider using SHA or SHA1 instead of MD5 for a bit better hash: http://kejser.org/exploring-hash-functions-in-sql-server/
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 6, 2016 at 9:45 am
If you're deleting millions of rows, then it's probably the transaction logging, not the actual join or delete operation that's killing you. It's been a couple of weeks since your post, but if you're still struggling with this, then consider "batching" your deletes, meaning write a looping block that deletes TOP X rows. For example, if you're trying to delete 1 million rows, then delete 100,000 for each loop. Start out by setting your batch size so that you're deleting in 10 batches.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply