June 29, 2010 at 6:02 am
Hi All,
I am currently devloping a package that moves rows from a source table to a destination table in another db. I then require to delete the copied rows from the source table.
At present I run an SQL Command OLE DB Source Task to get the rows to copy and then pass these to an OLE DB Destination Task to write them to the destination table. This works perfectly. I then use an SQL Execute Task to delete the same rows from the source table.
The problem I am finding is that the source table is very, very busy and in the time it takes to copy out the rows, more data is added so that when the delete task runs, more rows are deleted than were copied.
So far I have tried:
Creating a time-based parameter that works off a time column in the source data - didn't work
Putting all tasks in a container and using the container start time as a parameter to try and filter out rows added while the copy task was processing - didn't work.
What I need is a way to pass the id's of copied rows to the delete task so that only those explicit rows are deleted.
Here are my select and delete queries, these both identify 'pairs' of records with values of 0 and 2 to be affected. Please note that the SQL Execute delete task uses an ADO.NET connection, I don't know if that is part of the problem.
SELECT COUNT(*)
FROM dbo.Event WITH (NOLOCK)
WHERE Event.Id IN (SELECT zero.Id
FROM dbo.Event AS zero WITH (NOLOCK)
INNER JOIN dbo.Event AS two WITH (NOLOCK)
ON zero.Id = two.Id
AND two.AuditOperation = 2
WHERE zero.AuditOperation = 0)
DELETE
FROM dbo.Event
WHERE Event.Id IN (SELECTzero.id
FROMdbo.Event zero WITH (NOLOCK)
INNER JOIN dbo.Event two WITH (NOLOCK)
ON zero.Id = two.Id
AND two.AuditOperation = 2
WHERE zero.AuditOperation = 0)
TIA,
Chris
June 29, 2010 at 6:30 am
I had to face the same problem and it's not so easy to solve with SSIS, so I decided to use .NET code.
I'm no SSIS guru, so probably there's something better, but this is how I solved it:
using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Data.SqlClient;
namespace BulkCopyUtils
{
public class CopyData
{
string _sourceConnectionString;
string _destinationConnectionString;
public CopyData(string sourceConnectionString,
string destinationConnectionString)
{
_sourceConnectionString =
sourceConnectionString;
_destinationConnectionString =
destinationConnectionString;
}
public void CopyTable(string srcTableName, string destTableName, Boolean withMove)
{
using (SqlConnection source =
new SqlConnection(_sourceConnectionString))
{
string sql = string.Format("SELECT * FROM {0}", srcTableName);
try
{
DataSet ds = new DataSet();
SqlDataAdapter da = new SqlDataAdapter(sql, source);
SqlCommandBuilder cmdBuilder = new SqlCommandBuilder(da);
da.MissingSchemaAction = MissingSchemaAction.AddWithKey;
da.Fill(ds, srcTableName);
DataTable dt = ds.Tables[0];
using (SqlBulkCopy copy =
new SqlBulkCopy(_destinationConnectionString, SqlBulkCopyOptions.FireTriggers))
{
copy.DestinationTableName = destTableName;
copy.WriteToServer(dt);
}
for (int i = 0; i < dt.Rows.Count; i++)
{
dt.Rows.Delete();
}
da.Update(dt);
}
catch (Exception e)
{
throw e;
}
finally
{
source.Close();
}
}
}
public void CopyTable(string srcTableName, string destTableName)
{
CopyTable(srcTableName, destTableName, false);
}
public void CopyTable(string srcTableName)
{
CopyTable(srcTableName, srcTableName, false);
}
}
}
You could put this class in a CLR project and use it in a procedure.
-- Gianluca Sartori
June 29, 2010 at 7:08 am
Well Said...True innovation knows no bound...But to add from my side...
Let's make this method fool-proof:
Using Try-Catch-Finally block you may add exception handling in the code module.
(rather than uisng generic exception "Exception" use SQLException and related)
Cheerz to all!!!:-):-):-)
Raunak J
June 29, 2010 at 7:36 am
In your example you reference a column in the source table called "id".
If this is the primary key with new rows getting a sequentialy allocated value then you could simply get the current maximum value for the key and preserve it in a local variable and then use this to filter the INSERT and associated DELETE.
DECLARE @IdBIGINT
SELECT @Id = MAX(Id)
FROM dbo.Event
INSERT INTO archivedb.dbo.EventLog (Id,col2,col3,col4)
SELECT Id,
col2,
col3,
col4
FROM dbo.EventLog
WHERE Id <= @Id
DELETE
FROM EventLog
WHERE Id <= @Id
(I have left out any error checking so as not to clutter the code)
June 29, 2010 at 12:54 pm
I like to make my processes pretty bullet-proof when I can.
I believe Spencer wrote about checking the max id, that method may work, provided the data always deletes everything below that.
I have a different method, after you copy the data over, you copy JUST the primary key values to a table in tempdb and use that table as part of the join criteria in the delete against the main table.
ie:
DELETE mt
FROM dbo.maintable mt INNER JOIN tempdb.dbo.tempKeyTable tkt
ON mt.KeyField = tkt.KeyField
Using this method, you CANNOT possible delete a record that you don't KNOW is on the other side.
Downside is as the table drows so does the copy time, however, for a table with single integer keys the copy of even several million id values should be pretty quick.
CEWII
June 30, 2010 at 2:18 am
Hi All,
Thank you very much for the replies.
I would very much like to use the C# solution but unfortunately I have to fix this particular issue in SSIS.
I have now implemented a solution as Elliott describes, where I copy the unique identifier column id out of the source table to an nvarchar column in a temp table and then use that as a reference to identify which rows to delete.
Works ideally, and the performance is quite acceptable. 🙂
Thanks again!
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply