February 11, 2019 at 8:58 am
Hello,
I have a table with 200 millions records and the table has
p_Index] int IDENTITY(1,1) NOT NULL,
SessionID] [varchar](1000) NULL,
p_Line [nvarchar](max) NOT NULL,
Timestamp [datetime] NOT NULL,
FixedColumnsInPipeline] [int] NOT NULL,
pType] [varchar](50) NULL
I'm not sure how to delete the huge records on this table and I wanted to keep the recent year.
Please help.
Thanks in advance for your helps.
-Edwin
February 11, 2019 at 9:11 am
Of the 200 million rows, how many are in the current year, i.e. how many do you want to keep?
February 11, 2019 at 10:56 am
I would write a script to delete the rows in small defined batches, something like this:
DECLARE @Batchsize INT
SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */
IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
DROP TABLE #RowsToDelete;
IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
DROP TABLE #RowsDeleted;
CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );
/* Store all the p_Index that are over 1 year old in a temporary table */
SELECT t.p_Index
INTO #RowsToDelete
FROM myTable t
WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());
ALTER TABLE #RowsToDelete
ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);
DECLARE @RowCount INT;
SET @RowCount = @Batchsize;
/* Delete in batches */
WHILE @RowCount = @Batchsize BEGIN
DELETE TOP (@Batchsize) t
OUTPUT deleted.p_Index INTO #RowsDeleted
FROM myTable t
WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);
SET @RowCount = @@ROWCOUNT;
DELETE rtd
FROM #RowsToDelete rtd
WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);
TRUNCATE TABLE #RowsDeleted;
END;
GO
DROP TABLE #RowsDeleted;
GO
DROP TABLE #RowsToDelete;
GO
February 12, 2019 at 7:27 am
Jonathan AC Roberts - Monday, February 11, 2019 10:56 AMI would write a script to delete the rows in small defined batches, something like this:
DECLARE @Batchsize INT
SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
DROP TABLE #RowsToDelete;IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
DROP TABLE #RowsDeleted;CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );
/* Store all the p_Index that are over 1 year old in a temporary table */
SELECT t.p_Index
INTO #RowsToDelete
FROM myTable t
WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());ALTER TABLE #RowsToDelete
ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);DECLARE @RowCount INT;
SET @RowCount = @Batchsize;
/* Delete in batches */
WHILE @RowCount <> @Batchsize BEGIN
DELETE TOP (@Batchsize) t
OUTPUT deleted.p_Index INTO #RowsDeleted
FROM myTable t
WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);SET @RowCount = @@ROWCOUNT;
DELETE rtd
FROM #RowsToDelete rtd
WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);TRUNCATE TABLE #RowsDeleted;
END;
GO
DROP TABLE #RowsDeleted;
GO
DROP TABLE #RowsToDelete;
GO
I agree with this approach, but i wanted to know how many rows the OP wanted to keep because my thinking was, if it's a small proportion of the total number of rows, it might be quicker to create a new table, insert the rows to keep into the new table, then truncate the original table and repopulate it with the data inserted into the new table.
Would need to use IDENTITY_INSERT, but not difficult.
Regards
Lempster
February 12, 2019 at 7:47 am
Lempster - Tuesday, February 12, 2019 7:27 AMJonathan AC Roberts - Monday, February 11, 2019 10:56 AMI would write a script to delete the rows in small defined batches, something like this:
DECLARE @Batchsize INT
SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
DROP TABLE #RowsToDelete;IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
DROP TABLE #RowsDeleted;CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );
/* Store all the p_Index that are over 1 year old in a temporary table */
SELECT t.p_Index
INTO #RowsToDelete
FROM myTable t
WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());ALTER TABLE #RowsToDelete
ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);DECLARE @RowCount INT;
SET @RowCount = @Batchsize;
/* Delete in batches */
WHILE @RowCount <> @Batchsize BEGIN
DELETE TOP (@Batchsize) t
OUTPUT deleted.p_Index INTO #RowsDeleted
FROM myTable t
WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);SET @RowCount = @@ROWCOUNT;
DELETE rtd
FROM #RowsToDelete rtd
WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);TRUNCATE TABLE #RowsDeleted;
END;
GO
DROP TABLE #RowsDeleted;
GO
DROP TABLE #RowsToDelete;
GOI agree with this approach, but i wanted to know how many rows the OP wanted to keep because my thinking was, if it's a small proportion of the total number of rows, it might be quicker to create a new table, insert the rows to keep into the new table, then truncate the original table and repopulate it with the data inserted into the new table.
Would need to use IDENTITY_INSERT, but not difficult.
Regards
Lempster
It might be but the way I read it was it might be put into a regular job to keep only 1 year's data on the table.
In which case a script that could be run on a regular basis would be better. Have to wait and see what the OP had in mind.
February 12, 2019 at 8:44 am
I'm not hundred percent sure on the historian data that the business wanted to keep. My best guest is one year.
Also, I did some research and some suggested to:
.1. Create a table and insert the historian data into a new table.
.2. Truncated the old table
.3. Rename the new table to old table.
February 12, 2019 at 8:50 am
I'm reviewing Jonathan T-sql scripts and update shortly with questions.
Many thanks for Jonathan AC Robert and everyone helps. I'm very appreciated.
February 12, 2019 at 9:20 am
Jonathan AC Roberts - Monday, February 11, 2019 10:56 AMI would write a script to delete the rows in small defined batches, something like this:
DECLARE @Batchsize INT
SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
DROP TABLE #RowsToDelete;IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
DROP TABLE #RowsDeleted;CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );
/* Store all the p_Index that are over 1 year old in a temporary table */
SELECT t.p_Index
INTO #RowsToDelete
FROM myTable t
WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());ALTER TABLE #RowsToDelete
ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);DECLARE @RowCount INT;
SET @RowCount = @Batchsize;
/* Delete in batches */
WHILE @RowCount = @Batchsize BEGIN
DELETE TOP (@Batchsize) t
OUTPUT deleted.p_Index INTO #RowsDeleted
FROM myTable t
WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);SET @RowCount = @@ROWCOUNT;
DELETE rtd
FROM #RowsToDelete rtd
WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);TRUNCATE TABLE #RowsDeleted;
END;
GO
DROP TABLE #RowsDeleted;
GO
DROP TABLE #RowsToDelete;
GO
Sorry, correction needed to the script. It should read:WHILE @RowCount = @Batchsize BEGIN
I had <> instead of =
February 12, 2019 at 4:58 pm
Thank you, Jonanthan. Very appreciated.
February 12, 2019 at 5:10 pm
DECLARE @Batchsize INT
SET @Batchsize = 50; /* Change to whatever is a suitable number of rows to delete in the inner loop */
IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
DROP TABLE #RowsToDelete;
IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
DROP TABLE #RowsDeleted;
CREATE TABLE #RowsDeleted(Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED );
/* Store all the Pipeline_Index that are over 1 year old in a temporary table */
SELECT t.Pipeline_Index
INTO #RowsToDelete
FROM dbo.Pipeline_s t
WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());
ALTER TABLE #RowsToDelete
ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);
DECLARE @RowCount INT;
SET @RowCount = @Batchsize;
/* Delete in batches */
WHILE @RowCount = @Batchsize
BEGIN
DELETE TOP (@Batchsize) t
OUTPUT deleted.Pipeline_Index INTO #RowsDeleted
FROM dbo.Pipeline_s t
WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);
SET @RowCount = @@ROWCOUNT;
DELETE rtd
FROM #RowsToDelete rtd
WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);
TRUNCATE TABLE #RowsDeleted;
END;
GO
DROP TABLE #RowsDeleted;
GO
February 13, 2019 at 4:19 am
Ed7 - Tuesday, February 12, 2019 5:10 PMI'm not hundred percent sure after I modified your scripts and executed. It returned zero records.
DECLARE @Batchsize INT
SET @Batchsize = 50; /* Change to whatever is a suitable number of rows to delete in the inner loop */IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL DROP TABLE #RowsToDelete;IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
DROP TABLE #RowsToDelete;IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL DROP TABLE #RowsDeleted;IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
DROP TABLE #RowsDeleted;CREATE TABLE #RowsDeleted(Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED );CREATE TABLE #RowsDeleted(Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED );
/* Store all the Pipeline_Index that are over 1 year old in a temporary table */SELECT t.Pipeline_Index INTO #RowsToDelete FROM dbo.Pipeline_s t WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());/* Store all the Pipeline_Index that are over 1 year old in a temporary table */
SELECT t.Pipeline_Index
INTO #RowsToDelete
FROM dbo.Pipeline_s t
WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());ALTER TABLE #RowsToDelete ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);ALTER TABLE #RowsToDelete
ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);DECLARE @RowCount INT; SET @RowCount = @Batchsize;/* Delete in batches */WHILE @RowCount = @Batchsize BEGIN DELETE TOP (@Batchsize) t OUTPUT deleted.Pipeline_Index INTO #RowsDeleted FROM dbo.Pipeline_s t WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);DECLARE @RowCount INT;
SET @RowCount = @Batchsize;
/* Delete in batches */
WHILE @RowCount = @Batchsize
BEGIN
DELETE TOP (@Batchsize) t
OUTPUT deleted.Pipeline_Index INTO #RowsDeleted
FROM dbo.Pipeline_s t
WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);SET @RowCount = @@ROWCOUNT;SET @RowCount = @@ROWCOUNT;
DELETE rtd FROM #RowsToDelete rtd WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);DELETE rtd
FROM #RowsToDelete rtd
WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);TRUNCATE TABLE #RowsDeleted;TRUNCATE TABLE #RowsDeleted;
END;GODROP TABLE #RowsDeleted;GOEND;
GO
DROP TABLE #RowsDeleted;
GO
I don't think the code you have pasted in can be the code you have. A lot of the lines are doubled-up in your code.
Can you paste in your code without the doubled up lines and check it has copied correctly?
February 13, 2019 at 4:49 pm
Here are the scripts
==============
DECLARE @Batchsize INT
SET @Batchsize = 5; /* Change to whatever is a suitable number of rows to delete in the inner loop */
Questions:
What is mean the batchsize = 50? Would you please explain and clarify?
Many thanks,
Edwin
February 13, 2019 at 5:09 pm
@Batchsize is the number of rows it deletes each time it goes round the inner loop. You need to keep this value small enough so it doesn't take too long in each iteration of the loop. This is so other queries can access the table.
I'm a bit concerned about your query:SELECT Pipeline_Index
INTO #RowsToDelete
FROM dbo.Pipeline_s t
WHERE [Timestamp] > (GETDATE() - 7);
This will delete rows that are newer than one week old, not rows over a year old!
I think you meant to have something like:SELECT Pipeline_Index
INTO #RowsToDelete
FROM dbo.Pipeline_s t
WHERE [Timestamp] < DATEADD(yy,-1,GETDATE);
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply