(Quickly) Delete records from a table, with condition

Question

(Quickly) Delete records from a table, with condition

tacy.highland

Hall of Fame

Points: 3319
More actions
May 5, 2017 at 12:17 pm

#336727

Hello all,
I'm supporting a database that I inherited so now having to handle some issues that are now arising. In particular, I have a stored proc which just started giving me fits. This proc builds a temp table with a bunch of records within a date range, then deletes records within that same date range from a physical table (Table A), then inserts the records from the temp table into that physical table -- essentially providing new updated data in that physical table.
The sp is now hanging on the DELETE FROM TABLE A WHERE DATE BETWEEN X AND X.
I don't know if there's a better way to delete and insert a large number of records but was reading something about a variation, using a derived table like so:
delete t1
from (select top (10000) *
from t1 order by a) t1
I was going to rearrange my DELETE statement to something like:
DELETE TABLE A FROM (SELECT * FROM TABLE A WHERE DATE BETWEEN X AND X)
But the syntax looked a little odd to me. Mind you, there's no primary key on TABLE A so I can't do a DELETE FROM TABLE A WHERE ID IN (SELECT ID FROM #TEMPTABLE)
Any thoughts out there on the best way to handle this?
(BTW, after the delete is a pretty typical insert statement: INSERT TABLE A (col1, col2...) SELECT X, Y, Z FROM #TEMPTABLE)
Any help would be greatly appreciated!
Thanks

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Matt Simmons SSCrazy Points: 2846 More actions · Answer 1

Tacy,

Assuming you are not in a position to add a PK I would suggest adding an Index on your Date Field and depending how "wide" you data is to cover the rest of the table in the include.

Regards,
Matt

tacy.highland Hall of Fame Points: 3319 More actions · Answer 2

Thanks, Matt. But I'm not sure I'm following. There is an index on the date column. (Non Unique, Non Clustered). I'm looking to delete everything in the physical table in that date range.
What exactly are you suggesting?

Phil Parkin SSC Guru Points: 246985 More actions · Answer 3

Can you give us an idea how many rows are in the table and how many, typically, are in one deletion batch?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

tacy.highland Hall of Fame Points: 3319 More actions · Answer 4

There are around 1.5 million rows in the physical table. I'm trying to delete around 70K-100K rows on the delete statement.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

Wild idea:
Why don't you simply update the table instead of deleting and re-inserting?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

tacy.highland Hall of Fame Points: 3319 More actions · Answer 6

An update is not an option here.

Any other ideas? I'm getting desperate....

Phil Parkin SSC Guru Points: 246985 More actions · Answer 7

tacy.highland - Friday, May 5, 2017 2:20 PM
An update is not an option here.
Any other ideas? I'm getting desperate....

This will require some code changes, but have you considered adding a new BIT column to your table called IsDeleted (or whatever).
Instead of deleting the rows in the proc, just set the IsDeleted flag to 1.
Have a separate process running periodically to batch delete the rows where IsDeleted = 1.
Any code accessing the table would have to be modified to exclude rows where IsDeleted = 1.
This would speed up your proc, yet achieve the same end result.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Matt Simmons SSCrazy Points: 2846 More actions · Answer 8

My original thought was that since there was no PK there where no indexes either. Sorry that won't work.
My only other thought is to do the opposite of what you are trying to do. Copy the Good Data into a new/temp table, truncate the old table, copy the good data back into you table or similarly do it with a series of renaming tables.
Hope this helps.

Regards,
Matt

tacy.highland Hall of Fame Points: 3319 More actions · Answer 9

I can't truncate the table because I'm only updating/reinserting some records, not all. That's why I have to specify the date range in the delete statement.

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 10

Something like the following perhaps?
USE tempdb; GO

-- Create and populate a 1.5M row TestTable... IF OBJECT_ID('tempdb.dbo.TestData', 'U') IS NULL BEGIN CREATE TABLE dbo.TestData ( ID INT NOT NULL PRIMARY KEY CLUSTERED, SomeDate DATE NOT NULL );

WITH cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b), cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b), cte_Tally (n) AS ( SELECT TOP 1500000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM cte_n3 a CROSS JOIN cte_n3 b ) INSERT dbo.TestData (ID, SomeDate) SELECT ID = t.n, SomeDate = DATEADD(dd, CHECKSUM(NEWID()) % 5000, '2010-01-01') FROM cte_Tally t;

-- Add a nonclustered index to the TestData table to speed up the following delete operation... CREATE NONCLUSTERED INDEX ix_TestData_SomeDate ON dbo.TestData (SomeDate); END;

--=====================================================================

-- Create a temp table to hold the deleted rows... IF OBJECT_ID('tempdb..#Deleted', 'U') IS NOT NULL DROP TABLE #Deleted;

CREATE TABLE #Deleted ( ID INT, SomeDate DATE );

-- Delete the desired range of rows and use the OUTPUT clause to add the deleted rows to the temp table... DELETE td OUTPUT Deleted.* INTO #Deleted FROM dbo.TestData td WHERE td.SomeDate >= '2010-01-01' AND td.SomeDate < '2012-01-01';

--===================================================================== -- >>> Deleted rows are now in #Deleted and able to be "updated" as desired... <<< --=====================================================================

-- Reinsert the previously deleted (and subsiquently updated) rows back to the original table... INSERT dbo.TestData (ID, SomeDate) SELECT d.ID, d.SomeDate FROM #Deleted d;

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 11

tacy.highland - Friday, May 5, 2017 2:20 PM
An update is not an option here.
Any other ideas? I'm getting desperate....

Why is UPDATE not an option?
Your table doesn't have a PK, does it have a clustered index?
Is there an index on the date column?

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 12

tacy.highland - Friday, May 5, 2017 12:17 PM
The sp is now hanging on the DELETE FROM TABLE A WHERE DATE BETWEEN X AND X.

If it was working fine before and is now suddenly "hanging", that's a pretty good indication of two things... 1) parameter sniffing and 2) statistics that haven't been rebuilt in a while.

The first can be fixed by adding an option to recompile and should probably be done at the statement level rather than the SP level. Even that won't help, though, if statistics are out of date. The fix for that, of course, is to rebuild the statistics. If the SP is used on a regular basis to delete and the insert a decent number of rows on a regular basis, the number of rows may still not be big enough to trigger an auto-update of stats. If you figure it out and do a little planning, you might want to update either index or column stats at the beginning of the proc every time it runs only for the stats that the SP uses and then rebuilt all stats that have a modified rowcount for the whole table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 13

Jeff Moden - Saturday, May 6, 2017 8:59 PM
tacy.highland - Friday, May 5, 2017 12:17 PM
The sp is now hanging on the DELETE FROM TABLE A WHERE DATE BETWEEN X AND X.
If it was working fine before and is now suddenly "hanging", that's a pretty good indication of two things... 1) parameter sniffing and 2) statistics that haven't been rebuilt in a while.
The first can be fixed by adding an option to recompile and should probably be done at the statement level rather than the SP level. Even that won't help, though, if statistics are out of date. The fix for that, of course, is to rebuild the statistics. If the SP is used on a regular basis to delete and the insert a decent number of rows on a regular basis, the number of rows may still not be big enough to trigger an auto-update of stats. If you figure it out and do a little planning, you might want to update either index or column stats at the beginning of the proc every time it runs only for the stats that the SP uses and then rebuilt all stats that have a modified rowcount for the whole table.

Seems most likely to me. Less likely is that the table is a heap (we don't know yet) and repeated delete/insert cycles have increased the size of the table somewhat. There's a good article covering this here.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Sergiy SSC Guru Points: 110208 More actions · Answer 14

Are there other indexes on the table, apart of the non clustered on Date?

_____________
Code for TallyGenerator