August 15, 2012 at 3:02 pm
Title kind of says it all. I have a SQL 2005 Standard Edition, default instance where one of my PROD databases is running out of space. So - one of the bigger, less frequently used tables has been created in an ARCH db in the same instance (residing on a different drive). I copied the original set of rows from the PROD table to this ARCH table, but there are new records that come in to the PROD table on a semi-frequent basis.
I need to construct some sort of logic with T-SQL to compare and see what is already in the ARCH table, and delete those records from the PROD table, while leaving the rows that are not in the ARCH table yet alone for later exporting...
Any help with a PROC or Ad-Hoc way of doing this would be very much appreciated.
SQL_ME_RICH
P.S. SSIS is not an option. 🙁
August 15, 2012 at 3:07 pm
Are the table structures the same?
To be sure, you want to delete rows from PROD where they already exist in ARCH, correct?
Also, do both tables have a PK defined?
August 15, 2012 at 3:14 pm
Hi Lynn - thank you so much for your fast reply!
No - the tables are different in that the one in PROD has a PKey/Clustered Index on the leading column, and the one in the ARCH db is just a heap. Here is the table in the ARCH - same as PROD less the PKey/CI...
USE [yQueue_Archive]
GO
/****** Object: Table [dbo].[yConnections] Script Date: 08/14/2012 11:18:19 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[yConnections](
[ConnectionID] [varchar](30) NOT NULL,
[GUID] [uniqueidentifier] NULL,
127.0.0.1 [varchar](50) NULL,
[Interface] [varchar](20) NULL,
[Carrier] [varchar](20) NULL,
[Connects] [int] NULL,
[Disconnects] [int] NULL,
[KeepAlives] [int] NULL,
[TrxIn] [int] NULL,
[TrxOut] [int] NULL,
[TrxOutSMS] [int] NULL,
[TrxErrors] [int] NULL,
[TrxExceptions] [int] NULL,
[BytesIn] [bigint] NULL,
[BytesOut] [bigint] NULL,
[StampStart] [smalldatetime] NULL,
[StampLast] [smalldatetime] NULL,
[Serial] [varchar](20) NULL,
[LastStatus] [varchar](20) NULL,
[MaxQueueTime] [smallint] NULL,
[AvgQueueTime] [smallint] NULL,
[HealthIndex] [smallint] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
I've tried a few Ad-Hoc attempts to see if I could even just get a result set, but am running into problems getting them to identify the 2 databases (probably just rookie dev mistake stuff, but here is what I have been trying to use to find records which exist in PROD table but not in ARCH table):
SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t2.Id = t1.Id)
or
SELECT * FROM t1 LEFT OUTER JOIN T2 on t1.Id = t2.Id WHERE t2.Id IS NULL
Really the idea here is the create this process into a PROC that I can run in the evenings, and then an export via the Imp/Exp Wiz to get the records over into ARCH. Again - no SSIS (don't get me started on that one).
Thanks!
August 15, 2012 at 3:14 pm
Something like this should work, but it may need some tweaking depending on how many rows will be deleted, especially the first time.
DELETE FROM t1
FROM dbo.MyTable t1
WHERE EXISTS(SELECT 1 FROM ARCH.dbo.MyTable t2 WHERE t2.PK_Column = t1.PK_Column);
August 15, 2012 at 3:21 pm
Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?
DELETE FROM AutoDispatch
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?
Thank you again!
August 15, 2012 at 3:21 pm
Can you explain why you didn't just create a new filegroup with with the data file for that filegroup on the drive where you have the ARCH database and move the table to that filegroup so that you don't have these issues?
Or just create a synonym (or view) for that table in the PROD database that points to the table in the ARCH database so that any inserts into the PROD table actually just go into the ARCH database.
August 15, 2012 at 3:22 pm
You can also do some reading on EXCEPT and INTERSECT in Books Online. These could be used to identify rows that exist in one table and not another or rows that are common between the two tables.
August 15, 2012 at 3:24 pm
SQL_ME_RICH (8/15/2012)
Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?
DELETE FROM AutoDispatch
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?
Thank you again!
No. This is what your delete would look like.
USE AutoDispatch;
go
DELETE FROM t1
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
August 15, 2012 at 3:25 pm
Michael - great question, and not one that I hadn't thought of, but the place I am working at is not looking at best practices because in all honesty - they do not much care about this environment right now. I am trying to keep it limping along until they migrate it over fully to MySQL (another story for another tim), but best I can tell you is that I am given directives that are akin to putting a bandaid on a cut-off arm.
August 15, 2012 at 3:27 pm
Lynn Pettis (8/15/2012)
SQL_ME_RICH (8/15/2012)
Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?
DELETE FROM AutoDispatch
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?
Thank you again!
No. This is what your delete would look like.
USE AutoDispatch;
go
DELETE FROM t1
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
Perfect Lynn! Thank you, and thank you for the further tips on the EXCEPT and INTERSECT clauses!
🙂
August 15, 2012 at 3:27 pm
As for the tweaks:
USE AutoDispatch;
go
declare @BatchSize int;
set @BatchSize = 10000;
WHILE @BatchSize > 0
BEGIN
DELETE TOP (@BatchSize)
FROM t1
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
set @BatchSize = @@ROWCOUNT;
END
And that is at a minimum. Please read the following article for more information:
August 15, 2012 at 3:45 pm
I was just going to ask about tweaks (as in this current form it is really slow and the database is in Full Recovery mode, so I do not want to blow the log file up and out of the water). Was going to ask about TRUNCATE instead of DELETE, but I will take your queue here, and see what else I can come up with.
Thank you again, Lynn!
P.S. I did bump the previous thread from yesterday, but ever after that - no response (which was the only reason why I started this new one).
August 15, 2012 at 3:48 pm
SQL_ME_RICH (8/15/2012)
I was just going to ask about tweaks (as in this current form it is really slow and the database is in Full Recovery mode, so I do not want to blow the log file up and out of the water). Was going to ask about TRUNCATE instead of DELETE, but I will take your queue here, and see what else I can come up with.Thank you again, Lynn!
P.S. I did bump the previous thread from yesterday, but ever after that - no response (which was the only reason why I started this new one).
Remember, we are all volunteers and we all have lives outside of here and work. Just as I said, I was leaving work yesterday when I saw it, couldn't do anything as I had to leave. Just forgot about afterward. A bump today would have worked just as well as a new thread.
August 16, 2012 at 11:16 am
Lynn - Would I need to modify the WHERE clause in the sub-query if I wanted to do this in batches? So it's not trying to eat the elephant all at once?
I tried running this last night, but it ran for over 17 hours and still did not complete.
🙁
August 16, 2012 at 11:20 am
SQL_ME_RICH (8/16/2012)
Lynn - Would I need to modify the WHERE clause in the sub-query if I wanted to do this in batches? So it's not trying to eat the elephant all at once?I tried running this last night, but it ran for over 17 hours and still did not complete.
🙁
The subquery already has a WHERE clause. What you need, most likely, is an index on the t2.PK_Column column on the table ARCH.dbo.MyTable.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply