September 2, 2015 at 8:37 am
Hi,
Table Name: CSV_Details - Total records is 12138186 row(s),
As below delete script taking more than 55 min and not yet competed..
Could you suggest if any alternative ways?
Delete from CSV_Details where CSVID not in
--select * from CSV_Details where CSVID not in
(
select CSVID from CSV_Details where Mark_Rev_No_fab in
(
select Mark_No from Transmittal_Planning_MarkNo where Mark_No not in
(
select distinct Mark_Rev_No_fab from CSV_Details
where Mark_Rev_No_fab in
(
select distinct Mark_No from Transmittal_Planning_MarkNo
)
and DeleteFlag is null
)
) and DeleteFlag is not null
) and DeleteFlag is not null and cast(CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)
Thanks
September 2, 2015 at 8:52 am
First, is there any blocking going on? Second, have you looked at the execution plan to see what the pain points in the query are? If you have a table (or clustered index) scan on that large table, it's going to take a long time.
John
September 2, 2015 at 8:53 am
how many records is actually being deleted?
Good idea would be to do it in chunks.
September 2, 2015 at 9:12 am
Yes, or if this is part of an ETL process, consider copying the rows you want to keep to a new table, dropping the old table, and renaming the new one to the same name as the old. None of that will help, though, if the cause of the slowness is blocking, which is why that was my first question.
John
September 2, 2015 at 9:12 am
Is there a trigger on CSV_Details that is fired when there are deletes? If so, that is most likely the culprit.
Are there FKs to other tables that have ON DELETE CASCADE?
-SQLBill
September 2, 2015 at 9:15 am
Part of the problem is that your query is needlessly complex. You reference CVS_Details three times and Transmittal_Planning_MarkNo twice and some of these are redundant. To understand this, I'll use set notation. I'll use C to refer to CVS_Details and T to refer to Transmittal_Planning_MarkNo. Your inner queries are essentially T - (C n T), which is exactly the same as T - C. Without details of your data, it's difficult to tell if there are similar issues with the rest of the query.
Also, there is no need to use the DISTINCT keyword with IN/NOT IN subqueries.
Finally, EXISTS may perform better than IN/NOT IN, and will also produce different results if any of subqueries return NULL values.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 2, 2015 at 9:27 am
You might try reworking/tuning your DELETE statement.
Decent article on NOT IN alternatives: Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
YMMV but I'd also try identifying the values from your nested subqueries into a temp table and then JOINing to that for the delete.
CREATE TABLE #DeleteMe (CSVID INT PRIMARY KEY CLUSTERED);
INSERT INTO #DeleteMe (CSVID)
select CSVID from CSV_Details where Mark_Rev_No_fab in
(
select Mark_No from Transmittal_Planning_MarkNo where Mark_No not in
(
select distinct Mark_Rev_No_fab from CSV_Details
where Mark_Rev_No_fab in
(
select distinct Mark_No from Transmittal_Planning_MarkNo
)
and DeleteFlag is null
)
) and DeleteFlag is not NULL
DELETEcsv
FROMCSV_Details csv
LEFT OUTER JOIN #DeleteMe tmp ON csv.CSVID = tmp.CSVID
WHEREtmp.CSVID IS NULL
ANDcsv.DeleteFlag is not NULL
ANDCAST(csv.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)
_____________________________________________________________________
- Nate
September 2, 2015 at 9:44 am
Don't disagree with other comments. But maybe I can give you a quick fix, and maybe not.
First, you need an index on CSV_Details keyed first by Mark_Rev_No_fab that contains CSVID. If you don't have such an index, create one.
Then try this:
--Delete from CSV_Details where CSVID not in
select * from CSV_Details where CSVID not in
(
select CSVID from CSV_Details where Mark_Rev_No_fab in
(
select Mark_No from Transmittal_Planning_MarkNo tpm
where not exists(select 1 from CSV_Details cd where cd.Mark_Rev_No_fab = tpm.Mark_No)
and DeleteFlag is null
) and DeleteFlag is not null
) and DeleteFlag is not null
and CreatedDate >= DATEADD(day, datediff(day, 0, GETDATE()), 0) - 90 and CreatedDate < DATEADD(day, datediff(day, 0, GETDATE()), 0)
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 2, 2015 at 10:55 am
First question should be how many records of the 12,000,000+ are you deleting?
September 2, 2015 at 12:20 pm
Hi,
Thanks to all for yours valuable points.
1. I have taken backup of that CSV_Details tables before doing delete operation
2. In First query got deleted total 676839 row(s) affected and took - 01:18:06 Hrs.
3. I am stopped that second delete iteration as below query..
DeleteFlag is not null and cast(CreatedDate as date) between cast(GETDATE()-180 as date) and cast(GETDATE()-89 as date
First I will test that script in test setup as suggested to me Mr. RP_DBA and Mr. ScottPletcher then will confirm you..
Thanks
September 3, 2015 at 1:48 am
-- Your query looks like this...
DELETE FROM CSV_Details
WHERE CSVID NOT IN
(
-- level 2
SELECT CSVID
FROM CSV_Details d2
INNER JOIN Transmittal_Planning_MarkNo m2
ON m2.Mark_No = d2.Mark_Rev_No_fab
WHERE d2.DeleteFlag IS NOT NULL
AND NOT EXISTS (
-- level 1
select d1.Mark_Rev_No_fab
from CSV_Details d1 -- not deleted
INNER JOIN Transmittal_Planning_MarkNo m1
ON m1.Mark_No = d1.Mark_Rev_No_fab
where d1.DeleteFlag is null
and d1.Mark_Rev_No_fab = m2.Mark_No
-- level 1
)
-- level 2
) AND DeleteFlag IS NOT NULL
AND cast(CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)
-- Which could be as simple as this:
-- level 2
DELETE d2
FROM CSV_Details d2
WHERE d2.DeleteFlag IS NOT NULL
AND cast(d2.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)
AND EXISTS (
-- level 1
SELECT 1
FROM CSV_Details d1 -- not deleted
--INNER JOIN Transmittal_Planning_MarkNo m1
--ON m1.Mark_No = d1.Mark_Rev_No_fab
WHERE d1.DeleteFlag IS NULL
AND d1.Mark_Rev_No_fab = d2.Mark_Rev_No_fab
-- level 1
)
-- level 2
-- Try this instead (this is the "evaluation version" i.e. will it work?):
;WITH Deleter AS (
SELECT
d.*,
AnyDeleted = MAX(d.DeleteFlag) OVER(PARTITION BY d.Mark_Rev_No_fab)
FROM CSV_Details d
WHERE EXISTS (SELECT 1 FROM Transmittal_Planning_MarkNo m WHERE m.Mark_No = d.Mark_Rev_No_fab)
AND cast(d.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)
)
SELECT * FROM Deleter
-- which you would use like this:
;WITH Deleter AS (
SELECT
d.CSVID,
DeleteFlag,
AnyDeleted = MAX(d.DeleteFlag) OVER(PARTITION BY d.Mark_Rev_No_fab)
FROM CSV_Details d
WHERE EXISTS (SELECT 1 FROM Transmittal_Planning_MarkNo m WHERE m.Mark_No = d.Mark_Rev_No_fab)
AND cast(d.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)
)
DELETE FROM Deleter WHERE DeleteFlag IS NULL AND AnyDeleted IS NOT NULL
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 4, 2015 at 4:20 am
Forget everything about table structure and table size...
Just replace NOT IN and IN from your query with EXISTS and NOT EXISTS. You will get good performance in your delete query.
September 11, 2015 at 8:22 am
I will try to replace NOT IN with LEFT JOIN and filtering out with left column of left table IS NOT NULL. It had helped me in many cases.
September 11, 2015 at 8:50 am
Walton (9/11/2015)
I will try to replace NOT IN with LEFT JOIN and filtering out with left column of left table IS NOT NULL. It had helped me in many cases.
Have you tried the code I posted? There's a SELECT version so you can see what the statement will do without committing to any changes.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply