September 11, 2012 at 9:36 am
You could try using something like this:
declare @BatchSize int = 10000;
while @BatchSize <> 0
begin
begin try
begin transaction
delete top (@BatchSize)
from dbo.SPTLog
output
deleted.LogID, deleted.LogTime, getdate()
into
dbo.SPTLogArchive(LogID,LogTime,ArchiveTime)
where
LogTime < @dCutOffDate;
set @BatchSize = @@ROWCOUNT;
commit transaction;
end try
begin catch
rollback transaction;
set @BatchSize = 0; -- if you want to abort the delete, or set to 10000 if to continue
end catch
end
September 11, 2012 at 9:39 am
You might want to put a DELAY in the TRY part of the code after the commit to allow other activities to access the table from which you are deleting rows.
September 11, 2012 at 9:40 am
oscarooko (9/11/2012)
It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!
SELECT [LogID]
,[LogTime]
,GetDate()
FROM [dbo].[SPTLog]
WHERE [LogTime] < @dCutOffDate;
How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 11, 2012 at 9:43 am
ChrisM@Work (9/11/2012)
oscarooko (9/11/2012)
It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!
SELECT [LogID]
,[LogTime]
,GetDate()
FROM [dbo].[SPTLog]
WHERE [LogTime] < @dCutOffDate;
How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?
In fact what indexes are defined on the table dbo.SPTLog?
September 11, 2012 at 9:49 am
Lynn Pettis (9/11/2012)
ChrisM@Work (9/11/2012)
oscarooko (9/11/2012)
It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!
SELECT [LogID]
,[LogTime]
,GetDate()
FROM [dbo].[SPTLog]
WHERE [LogTime] < @dCutOffDate;
How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?
In fact what indexes are defined on the table dbo.SPTLog?
Is LogID a foreign key in another table?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 11, 2012 at 10:08 am
ChrisM@Work (9/11/2012)
Lynn Pettis (9/11/2012)
ChrisM@Work (9/11/2012)
oscarooko (9/11/2012)
It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!
SELECT [LogID]
,[LogTime]
,GetDate()
FROM [dbo].[SPTLog]
WHERE [LogTime] < @dCutOffDate;
How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?
In fact what indexes are defined on the table dbo.SPTLog?
Is LogID a foreign key in another table?
Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.
Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.
September 11, 2012 at 11:17 am
Lynn Pettis (9/11/2012)
ChrisM@Work (9/11/2012)
Lynn Pettis (9/11/2012)
ChrisM@Work (9/11/2012)
oscarooko (9/11/2012)
It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!
SELECT [LogID]
,[LogTime]
,GetDate()
FROM [dbo].[SPTLog]
WHERE [LogTime] < @dCutOffDate;
How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?
In fact what indexes are defined on the table dbo.SPTLog?
Is LogID a foreign key in another table?
Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.
Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.
Good People, I really do appreciate your help with this. The comments thus far have provided me with lots of insight.
LogID is Primary key in SPTLog table. It is the foreign key in SPTLogArchive
In the code I provided, I did NOT include the SPTLogArchiveID column and a two other columns...I was just trying not to clutter my request(may be dumb)
In the table SPTLog, I have indexes on LogTime and LogID(PrimaryKey)
In the table SPTLogArchive I have indexes on LogTime- nonClustered, and on SPTLogArchiveID-Primary key
Lynn, I am testing the code you gave me. Looks like I might have some luck with it. I will keep you posted with the results.
September 11, 2012 at 12:04 pm
Appreciate you telling us what indexes are defined, but it would be nice to see the actual definitions.
September 11, 2012 at 1:06 pm
/*Indexes for SPTlogArchive*/
/****** Object: Index [IX_SPTLogArchive_LogTime] Script Date: 09/11/2012 14:46:56 ******/
CREATE NONCLUSTERED INDEX [IX_SPTLogArchive_LogTime] ON [dbo].[SPTLogArchive]
(
[LogTime] ASC
)
INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
----------------------
/****** Object: Index [PK_SPTLogArchiveID] Script Date: 09/11/2012 14:53:06 ******/
ALTER TABLE [dbo].[SPTLogArchive] ADD CONSTRAINT [PK_SPTLogArchive] PRIMARY KEY CLUSTERED
(
[SPTLogArchiveID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
----------------------------------
/*Indexes for SPTlog*/
----------------------------------
/****** Object: Index [IX_index_SPTLog_LogTime] Script Date: 09/11/2012 15:00:25 ******/
CREATE NONCLUSTERED INDEX [IX_index_SPTLog_LogTime] ON [dbo].[SPTLog]
(
[LogTime] ASC
)
INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
-------------------
/****** Object: Index [PK_SPTLog] Script Date: 09/11/2012 15:00:55 ******/
ALTER TABLE [dbo].[SPTLog] ADD CONSTRAINT [PK_SPTLog] PRIMARY KEY CLUSTERED
(
[LogID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
September 11, 2012 at 1:12 pm
The DELETE will be doing a CLUSTERED INDEX (i.e. table) scan. It won't use the index on LogTime.
Just so you know.
September 11, 2012 at 1:25 pm
So you saying even though I have a the criteria LogTime < @dCutOffDate and an index is defined on LogTime, it still will NOT be used?
Any ideas on what to change to optimize the delete?
September 11, 2012 at 1:32 pm
Lynn,
Just got done running the query. I deleted 47,360 records in 1hr 27 mins using your method. Wow, still too long!
September 11, 2012 at 1:33 pm
oscarooko (9/11/2012)
So you saying even though I have a the criteria LogTime < @dCutOffDate and an index is defined on LogTime, it still will NOT be used?Any ideas on what to change to optimize the delete?
Basically, yes, that is what I am saying. Using the index and then bookmark loops to delete records will be more expensive than just scanning the clustered index (table). If you were deleting individual rows using the index, then SQL Server would most likely use the index.
One way would be to have the LogTime defined as your clustered index. If LogID is an identity column, you could have it as the second column of the clustered index and make it a multi-column clustered index. You could still have LogID as the PK, just create it as a nonclustered index.
September 11, 2012 at 2:48 pm
oscarooko (9/11/2012)
Lynn Pettis (9/11/2012)
ChrisM@Work (9/11/2012)
Lynn Pettis (9/11/2012)
ChrisM@Work (9/11/2012)
oscarooko (9/11/2012)
It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!
SELECT [LogID]
,[LogTime]
,GetDate()
FROM [dbo].[SPTLog]
WHERE [LogTime] < @dCutOffDate;
How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?
In fact what indexes are defined on the table dbo.SPTLog?
Is LogID a foreign key in another table?
Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.
Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.
Good People, I really do appreciate your help with this. The comments thus far have provided me with lots of insight.
LogID is Primary key in SPTLog table. It is the foreign key in SPTLogArchive
In the code I provided, I did NOT include the SPTLogArchiveID column and a two other columns...I was just trying not to clutter my request(may be dumb)
In the table SPTLog, I have indexes on LogTime and LogID(PrimaryKey)
In the table SPTLogArchive I have indexes on LogTime- nonClustered, and on SPTLogArchiveID-Primary key
Lynn, I am testing the code you gave me. Looks like I might have some luck with it. I will keep you posted with the results.
If there's no index on it, then that's your problem. Each delete in SPTLog will cause a table scan of the column in SPTLogArchive to ensure that the key doesn't exist there - if it does, SQL Server will throw an error. They won't exist because you are moving rows from one table to the other - but you will still get table scans for every delete. The circular reference will ensure that keys only exist in one table at any one time but it's an expensive process especially without indexes on the FK columns.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
September 12, 2012 at 4:50 am
ChrisM@home (9/11/2012)
...If there's no index on it, then that's your problem. Each delete in SPTLog will cause a table scan of the column in SPTLogArchive to ensure that the key doesn't exist there - if it does, SQL Server will throw an error. They won't exist because you are moving rows from one table to the other - but you will still get table scans for every delete. The circular reference will ensure that keys only exist in one table at any one time but it's an expensive process especially without indexes on the FK columns.
This doesn't quite stack up, and in any case if there were a FK relationship between the tables, then Lynn's code would have raised the following error:
"Msg 332, Level 16, State 1, Line 41
The target table 'dbo.SPTLogArchive' of the OUTPUT INTO clause cannot be on either side of a (primary key, foreign key) relationship. Found reference constraint 'FK_SPTLogArchive_SPTLog'."
However, you've stated that there is a FK relationship somewhere, so here's a code block to work around it, modified from your original code:
DECLARE @dCutOffDate DATE;
SET @dCutOffDate = '2008-02-20'; -- or whatever you want your value to be
ALTER TABLE [dbo].[SPTLog] NOCHECK CONSTRAINT ALL;
ALTER TABLE [dbo].[SPTLogArchive] NOCHECK CONSTRAINT ALL;
begin try;
begin transaction;
INSERT INTO [dbo].[SPTLogArchive]
([LogID]
,[LogTime]
,[ArchiveTime])
SELECT
[LogID]
,[LogTime]
,GetDate()
FROM [dbo].[SPTLog]
WHERE [LogTime] < @dCutOffDate;
DELETE FROM [dbo].[SPTLog]
WHERE LogTime < @dCutOffDate;
commit transaction;
end try
begin catch;
rollback transaction;
end catch
ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;
ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;
The code disables the FK constraints before the transaction and enables them again afterwards. Notice that the two enable statements
ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;
ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;
differ. The second statement (for SPTLogArchive)uses the clause “WITH NOCHECK” which instructs SQL Server NOT to check the new values – if it does, the statement will fail.
This batch works just fine, moving about 40 thousand of 3 million rows from SPTLog to SPTLogArchive in about a second on my local instance. I've not incorporated "chunking" into this, where subsets of the data are moved (INSERT and DELETE of the same set of keys) within separate transactions to mitigate resource waits.
Here’s the sample setup I used for the test:
USE Sandbox
GO
DROP TABLE SPTLogArchive
DROP TABLE SPTLog
GO
----------------------------------
/* SPTlog */
----------------------------------
CREATE TABLE [dbo].[SPTLog] (
LogID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLog PRIMARY KEY CLUSTERED,
LogTime DATE NOT NULL)
-- LogID is redundant - it's already part of the index as the cluster key
CREATE NONCLUSTERED INDEX [IX_index_SPTLog_LogTime] ON [dbo].[SPTLog]
([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
----------------------------------
/* SPTLogArchive */
----------------------------------
CREATE TABLE [dbo].[SPTLogArchive] (
SPTLogArchiveID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLogArchive PRIMARY KEY CLUSTERED,
LogID INT NULL CONSTRAINT FK_SPTLogArchive_SPTLog REFERENCES SPTLog (LogID),
LogTime DATE NOT NULL,
ArchiveTime DATE NOT NULL)
CREATE NONCLUSTERED INDEX [IX_SPTLogArchive_LogTime] ON [dbo].[SPTLogArchive]
([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
----------------------------------
/* Load sample data into SPTLog */
----------------------------------
SET IDENTITY_INSERT [dbo].[SPTLog] ON
INSERT INTO [dbo].[SPTLog] (LogID,LogTime)
SELECT
NewLogid = ROW_NUMBER() OVER(ORDER BY rn DESC),
NewDate = DATEADD(minute,0-rn,GETDATE())
FROM (
SELECT TOP 3000000 rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM SYS.columns a, sys.columns b, sys.columns c
) d
SET IDENTITY_INSERT [dbo].[SPTLog] OFF
ALTER INDEX PK_SPTLog ON SPTLog REBUILD
GO
DON’T use this in a production environment. What we need to see now is the full DDL for both tables.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 16 through 30 (of 32 total)
You must be logged in to reply to this topic. Login to reply