March 8, 2022 at 9:22 am
Hi all
I'll explain this as best I can but I've no doubt I'll miss something so feel free to ask questions.
Anyway, here goes......
We've got a copy of a database in the cloud (it's a third party database) and we need to copy the data from there to a local database (in exactly the same format).
We're currently doing this in SSIS (the packages are supplied by the 3rd-party as well, we're just tweaking them to improve performance).
There are 3 packages s follows:-
Master - This control everything and does some things that are only needed once every run. It also spawns 8 copies of the next package (GetNextTable.dtsx)
GetNextTable - This grabs the next table in the list, does a couple of checks (makes sure the table exists in the source and there's new data into it) and then spawns a copy of "FDSyncData_Source2Dest.dtsc". This is all contained in a For Loop container that will repeat until all the tables have been dealt with.
FDSyncData_Source2Dest - This does the actual transfer of the data. It appears to use a script task to do the actual transfer (again, this is supplied by the 3rd-party) and I'm not sure if that is part of the issue.
Before I started making the changes to load multiple tables at once, everything was done sequentially (and was therefore taking forever) and we were still running into the same issue.
Now for the issue......
I've got around 4000 tables to check for new data (hence the need to load them simultaneously) but I'm running out of memory because of the volume of FDSyncData_Source2Dest that are being spawned (1 per table). They don't appear to be shutting down when they've finished transferring the data from the table it's been assigned.
Running it on my local machine, memory usage gets to around 4GB (I've checked in task manager) and then the packages start throwing out of memory errors.
Does anyone know how to get the packages to close properly (and therefore free up memory) when they've completed?
Any help on this would be greatly appreciated.
If I've missed anything, feel free to ask questions.
TIA
Richard
March 8, 2022 at 9:34 am
When you refer to 'spawning' of packages, is this done via multiple Execute Package tasks, or in some other way?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 8, 2022 at 9:37 am
Hi Phil
Each of the 8 GetNextTable packages spawns it's own copy of FDSyncData_Source2Dest using an execute package task.
It looks like I can get to around 1000 of the FDSyncData_Source2Dest packages before I run out of memory.
Regards
Richard
March 8, 2022 at 10:06 am
I don't know how to fix things using your current top-down package hierarchy, though I do know that there used to be a memory leak in FOR loops years ago (which I'd assumed was fixed, to be honest) – link.
Can you provide the code which is inside the Script Task (which does the data transfer)? Perhaps there are some objects which are not being properly disposed of.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 8, 2022 at 10:14 am
Hi Phil
Thanks for the link, looks like the memory leak is still there. Not sure how I'm going to resolve that one with a script task, but I'll do some research and see what I can come up with. Its a shame MS haven't fixed that one in the last 10 years or so.
Anyway, here's the code (in C#) for the transfer script task:-
#region Help: Introduction to the script task
/* The Script Task allows you to perform virtually any operation that can be accomplished in
* a .Net application within the context of an Integration Services control flow.
*
* Expand the other regions which have "Help" prefixes for examples of specific ways to use
* Integration Services features within this script task. */
#endregion
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.SqlClient;
#endregion
namespace ST_9372ec3a55a547b39e117374fb1dcd22
{
/// <summary>
/// ScriptMain is the entry point class of the script. Do not change the name, attributes,
/// or parent of this class.
/// </summary>
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
bool fireAgain;
string strSelectStatement;
string strDestinationStatement;
public void Main()
{
//Get the Select statement and destination statements which were created in previous task
strSelectStatement = (string)Dts.Variables["Select_Statement"].Value;
strDestinationStatement = (string)Dts.Variables["Destination_Statement"].Value;
//MessageBox.Show("0");
Dts.TaskResult = (int)ScriptResults.Failure; // initial state
//MessageBox.Show("1");
// Open a sourceConnection to the database.
object rawSourceConnection = Dts.Connections["Source_Database_ADO"].AcquireConnection(Dts.Transaction);
//MessageBox.Show("1.1");
SqlConnection sourceConnection = (SqlConnection)rawSourceConnection;
//MessageBox.Show("2");
// Get data from the source table as a SqlDataReader.
SqlCommand commandSourceData = new SqlCommand(strSelectStatement, sourceConnection);
//MessageBox.Show("2.1");
SqlDataReader reader = commandSourceData.ExecuteReader();
//MessageBox.Show("3");
// Open the destination connection.
object rawTargetConnection = Dts.Connections["Dest_Database_ADO"].AcquireConnection(Dts.Transaction);
SqlConnection destinationConnection = (SqlConnection)rawTargetConnection;
//MessageBox.Show("4");
// Set up the bulk copy object.
using (SqlTransaction transaction = destinationConnection.BeginTransaction())
{
using (SqlBulkCopy bulkCopy =
new SqlBulkCopy(destinationConnection, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.TableLock, transaction))
{
bulkCopy.DestinationTableName = strDestinationStatement;
bulkCopy.BatchSize = 10000;
bulkCopy.BulkCopyTimeout = 0;
// Set up the event handler to notify after 10000 rows.
bulkCopy.SqlRowsCopied +=
new SqlRowsCopiedEventHandler(OnSqlRowsCopied);
bulkCopy.NotifyAfter = 10000;
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
//MessageBox.Show("6");
transaction.Commit();
//MessageBox.Show("7");
Dts.TaskResult = (int)ScriptResults.Success; // only success when everything works fine
//MessageBox.Show("9");
}
catch (Exception ex)
{
Dts.Events.FireInformation(2, "Script failure for object:" + strDestinationStatement + " (" + strSelectStatement + ")", "error:" + ex.Message, "", 0, ref fireAgain);
//MessageBox.Show("1");
transaction.Rollback();
//MessageBox.Show("10");
MessageBox.Show("Script failure for object:" + strDestinationStatement + " (" + strSelectStatement + ")", "error:" + ex.Message);
Dts.TaskResult = (int)ScriptResults.Failure;
}
finally
{
// Close the SqlDataReader. The SqlBulkCopy
// object is automatically closed at the end
// of the using block.
reader.Close();
}
}
}
}
void OnSqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
{
long l = e.RowsCopied;
Dts.Events.FireInformation(1, "Copy data for object:" + strDestinationStatement, "Copied so far..." + e.RowsCopied.ToString(), "", 0, ref fireAgain);
}
static class SqlBulkCopyHelper
{
static System.Reflection.FieldInfo rowsCopiedField = null;
/// <summary>
/// Gets the rows copied from the specified SqlBulkCopy object
/// </summary>
/// <param name="bulkCopy">The bulk copy.</param>
/// <returns></returns>
public static int GetRowsCopied(SqlBulkCopy bulkCopy)
{
if (rowsCopiedField == null)
{
rowsCopiedField = typeof(SqlBulkCopy).GetField("_rowsCopied", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.GetField | System.Reflection.BindingFlags.Instance);
}
return (int)rowsCopiedField.GetValue(bulkCopy);
}
}
#region ScriptResults declaration
///
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
///
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
You can ignore the message box, I put that in for something else.
Regards
Richard
March 8, 2022 at 11:11 am
OK, maybe this will help. Try adding the following 5 lines of code outside and after the first Using block:
...
finally
{
// Close the SqlDataReader. The SqlBulkCopy
// object is automatically closed at the end
// of the using block.
reader.Close();
}
}
}
sourceConnection.Close();
sourceConnection.Dispose();
destinationConnection.Close();
destinationConnection.Dispose();
commandSourceData.Dispose();
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 8, 2022 at 11:18 am
Thanks Phil
I'll give that a go and let you know how I get on.
I've also got a timeout error on a fairly large table (90k rows) that I'm looking into but that's a different story.
Regards
Richard
March 8, 2022 at 11:58 am
you likely also should change the BatchSize - 10k is normally way short of the best value - and optimal value also depends if your destination table is a HEAP, Clustered index or clustered columnstore.
values for clustered columnstore would likely be optimal around 102400 or if high volumes of data a value of 1048576 (size of a RowGroup)
for the others I normally start at 50k
March 8, 2022 at 1:00 pm
@Phil - that's helped. VS is now only using around 800MB max. I'm still getting memory errors. An example would be:-
Loading table: Data_Dictionary.Hepatitis_C_Infection_Indicator_SCD; Error Message: Error 0x80070008 while executing package from project reference package "FDSyncData_Source2Dest.dtsx". Not enough memory resources are available to process this command.
@frederico - these are all heap tables (don't ask) so I'll change the batch size to 50k and see if that helps.
Thanks folks
Richard
March 8, 2022 at 1:12 pm
Perhaps try modifying your Master package to spawn only four copies of GetNextTable?
Given that it bombed out at 4GB before, it's a bit strange how it only made it to 800MB this time. How big is the table it bombed on?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 8, 2022 at 1:45 pm
In reverse order .....
The table it bombed on is only 93k records or so (so not massive).
It's an external table in an Azure database and looks like this:-
CREATE EXTERNAL TABLE [CQC_Providers].[Register]
(
[ID] [bigint] NOT NULL,
[ADF_Pipeline_Run_ID] [varchar](50) NOT NULL,
[providerId] [varchar](50) NOT NULL,
[JSON] [nvarchar](MAX) NOT NULL,
[page] [varchar](50) NOT NULL,
[API_Call_Date] [datetime] NOT NULL
)
WITH (DATA_SOURCE = [RemoteReferenceData_CQC_Providers],SCHEMA_NAME = N'CQC_Providers',OBJECT_NAME = N'Register')
(I've just copied the CREATE statement for it).
There are no indexes on it (as far as I can see) which is potentially a problem, but the vendor has assured us it's quicker to download the entire table as a replacement for the current one and do a rename rather than just downloading new/updated records.
Bizarrely, I've run the entire routine again and that table has processed without an issue. I'm starting to get the feeling I'm trying to process too many tables at once so reducing the calls to GetNextTable to 4 is my next step.
Updating the batch size to 50k seems to have helped a bit (no timeout errors so far) but Task Manager currently has VS at 2.5GB but is fairly stable there.
Regards
Richard
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply