Using the Script Task in SSIS to Process Data Files When They Arrive

  • tonybogyo (9/8/2014)


    opc.three (10/25/2012)


    dave-dj (10/25/2012)


    excellent and comprehensive article.

    I only have a couple of questions..

    first - if you where required to process multiple files per day, would you adapt this process, or would you use a process similiar to the others whereby the package is run every 5 minutes (and subsequently loaded and unloaded from memory)?

    If it were part of the requirement to process multiple files per day then you have some options. In keeping with the idea that we want our SSIS package to run continuosly, and not have to stop and start it every so often, I would look into putting the entire set of tasks into a For Loop with a condition of 'where 1=1'. This would effectively bring us back to the 'watcher' after processing a file. If there was a possibility for a file to arrive during the processing of a previous file, then you could consider setting WatcherInputFindExistingFiles to True to ensure that file were processed.

    Thank you so much for taking the time to document this project - it has been incredibly helpful to me as I needed to do a very similar task - great stuff!

    Thanks for the feedback.

    I have a situation where I need the watcher to be active fairly continually as files dropped in the drop folder could be dropped at any time, possibly multiple times a day. I Built all the tasks in the project and everything worked well, so I incorporated them all in a For Loop with a condition of 1 == 1. When I start the project and drop in a few files they process through without issue - everything works a planned and I can see the script task sitting there in yellow watching for new files. The problem is, after the project process a first file or group of files, subsequent files dropped into the drop directory do not get picked up and I don't know why - perhaps something needs to be reset?

    I have not seen it make any difference one way or the other but some have had success setting the EnableRaisedEvents property to true before starting to wait. See if this earlier post on this thread is relevant:

    http://www.sqlservercentral.com/Forums/FindPost1436810.aspx

    Perhaps running in a loop isn't the best plan anyway - it looks like there could be some concerns with CPU usage. I could just call a non-looped package as a SQL Job Agent every few minutes, but unless I have the tasks in a loop it appears to only process 1 file at a time - if the system dropping files suddenly drops 50 files in the directory I'd need the Job Agent to restart 50 times to process through all of them - there's got to be a better way - any suggestions?

    As far as I know there are no concerns surrounding CPU. When the Script Task waits, 0% CPU would be used. When files arrive, it would process them, then hit your 1=1 loop, then begin waiting again at 0% CPU. It is true this demo code was built with single-file processing in mind. To get subsequent files you would need to run the Package again, or modify the Package to loop, and I think you are now in that place where you have the option to go either direction. I would setup a loop because as you see, you would need to wait for 50 job executions to process 50 files, which could be too long to wait. With a loop you would run through those files as fast as the Package could process them.

    I should mention I'm using 2008R2 and used the updated script posted in this thread for my script task (thank you again for posting that!)

    You're very welcome!

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi,

    I feel this is a great article. It is very useful to me.

    I want to know about when the multiple files arrived in drop directory, all files should be moved to processed at single run.

  • sprasannamsbi (10/18/2014)


    Hi,

    I feel this is a great article. It is very useful to me.

    I want to know about when the multiple files arrived in drop directory, all files should be moved to processed at single run.

    Thanks for the kind comment. Please see my response to an earlier comment that speaks to the issue of possibly processing multiple files:

    http://www.sqlservercentral.com/Forums/FindPost1376979.aspx

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi,

    Thanks for ur reply. I am sure this helps me lot. But i have one more doubt, how to log all the filenames into custom table which are in existed in drop directory and its subdirectories. I want to log filenames,no of files found or no file exist information into custom table . bcoz i need to do the filename and header validation before i load the data. i pick the filenames which are logged into customtable and i validate with the table which i have.

    I am awaiting for your early reply. Thanks in advance.

    Kindly help me in this.

    Thanks,

    Prasanna.

  • Read up on the Foreach Loop Container and the Execute SQL Task. You'll want to make a call to your database to insert the details of each file within the loop using an Execute SQL Task. Let me know if you have specific questions down the line after trying it out.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for the great article. I'm using SQL Server 2012 and all work well, until I add a Send Mail task at the end. The object of the exercise is to watch for files dropped into a directory (more that one, so I've used a For Loop), then process the data, then email it. On the second run it fails on the Send Mail Task with "The operation has timed out". The very last task is File System Task which copies the file to an 'Archive' directory.

    Any idea what might be causing the failure?

    Thanks in advance...

  • Rich Parsons (2/18/2015)


    Thanks for the great article. I'm using SQL Server 2012 and all work well, until I add a Send Mail task at the end. The object of the exercise is to watch for files dropped into a directory (more that one, so I've used a For Loop), then process the data, then email it. On the second run it fails on the Send Mail Task with "The operation has timed out". The very last task is File System Task which copies the file to an 'Archive' directory.

    Any idea what might be causing the failure?

    Thanks in advance...

    Hi Rich,

    Attach your dtsx file to this thread or to a Private Message to me and I'll take a look.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks Orlando - I've PM'd you but can't attached the dtsx file.

  • Hi Orlando - no, I still can't get the Send Mail Task to work within the infinite loop. As I suggested, it consistently fails on the second run with a 'process has timed out' error. This all despite recreating the entire package from scratch and redefining all the variables, etc. Originally, I used the WMI event watcher and the Send Mail worked fine, though the watcher didn't do its job consistently (it seemed to randomly miss files dropped into the designated directory). Thus I changed to the script task, only to be frustrated yet again.

    Perhaps the archived file (the email attachment) is locked from the first run?

    Thanks for all your help/suggestions so far.

  • Ah ah - just sussed out the problem. I changed the Timeout setting on the SMTP Connection Manager to be 15000 milliseconds. All now works well.

  • Hello Orlando, hello All,

    first: Thanks Orlando for this great walk through, it works perfect and it helped me to go around the FileWatcher Task, which IT refused to install on the SQL server (I run the tasks in the past from my laptop and change it now).

    As I write normally in vb.net, I tried to translate the script code via different translation tools but it seemes the code is to complex for these tools. The vb.net versions I got out ended always in errors. Do you or someone else has a vb.net version or a capable translater? Such a version would allow me to understand the code even better and to learn more.

    Thanks

    Best regards

    Boris

  • Hi Boris, Thanks for the kind feedback.

    I would like to help but must admit that I am woefully ill-equipped to write VB.net code these days. I am sorry but I won' be able to ramp up either so won't be able to help on this one.

    I started out learning VB.net version 1 because I was a VB6 developer before that. However, a more experienced developer that worked with me who was very plugged into the community (I was only getting started in the field) explained how C# would eventually become the default .NET language and I never looked back. The $40 book I purchased and had planned to use to learn VB.net ended up going unread. I am not sorry I did either, as he was 100% correct.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • yeah, unfortunatly you're right that vb.net is anything but the nr one .net language :ermm:

  • Boris, try this: http://converter.telerik.com/

  • Thank's Dave,

    I forgot the "Enable Tab Key" check mark, after setting this the code has been translated, althought I had to tweak some lines.

    I testet the code (one file, several files, files in use and not in use) and didn't experienced any more errors.


    #Region "Help: Introduction to the script task"

    ' The Script Task allows you to perform virtually any operation that can be accomplished in

    ' * a .Net application within the context of an Integration Services control flow.

    ' *

    ' * Expand the other regions which have "Help" prefixes for examples of specific ways to use

    ' * Integration Services features within this script task.

    #End Region

    #Region "Namespaces"

    Imports System.Data

    Imports Microsoft.SqlServer.Dts.Runtime

    Imports System.Windows.Forms

    Imports System.IO

    Imports System.Threading

    #End Region

    Namespace SqlServerCentral

    ''' <summary>

    'ScriptMain is the entry point class of the script. Do not change the name, attributes,

    'or parent of this class.

    <Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute()> _

    <System.CLSCompliantAttribute(False)> _

    Partial Public Class ScriptMain

    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    #Region "Help: Using Integration Services variables and parameters in a script"

    ' To use a variable in this script, first ensure that the variable has been added to

    ' * either the list contained in the ReadOnlyVariables property or the list contained in

    ' * the ReadWriteVariables property of this script task, according to whether or not your

    ' * code needs to write to the variable. To add the variable, save this script, close this instance of

    ' * Visual Studio, and update the ReadOnlyVariables and

    ' * ReadWriteVariables properties in the Script Transformation Editor window.

    ' * To use a parameter in this script, follow the same steps. Parameters are always read-only.

    ' *

    ' * Example of reading from a variable:

    ' * DateTime startTime = (DateTime) Dts.Variables["System::StartTime"].Value;

    ' *

    ' * Example of writing to a variable:

    ' * Dts.Variables["User::myStringVariable"].Value = "new value";

    ' *

    ' * Example of reading from a package parameter:

    ' * int batchId = (int) Dts.Variables["$Package::batchId"].Value;

    ' *

    ' * Example of reading from a project parameter:

    ' * int batchId = (int) Dts.Variables["$Project::batchId"].Value;

    ' *

    ' * Example of reading from a sensitive project parameter:

    ' * int batchId = (int) Dts.Variables["$Project::batchId"].GetSensitiveValue();

    ' *

    #End Region

    #Region "Help: Firing Integration Services events from a script"

    ' This script task can fire events for logging purposes.

    ' *

    ' * Example of firing an error event:

    ' * Dts.Events.FireError(18, "Process Values", "Bad value", "", 0);

    ' *

    ' * Example of firing an information event:

    ' * Dts.Events.FireInformation(3, "Process Values", "Processing has started", "", 0, ref fireAgain)

    ' *

    ' * Example of firing a warning event:

    ' * Dts.Events.FireWarning(14, "Process Values", "No values received for input", "", 0);

    ' *

    #End Region

    #Region "Help: Using Integration Services connection managers in a script"

    ' Some types of connection managers can be used in this script task. See the topic

    ' * "Working with Connection Managers Programatically" for details.

    ' *

    ' * Example of using an ADO.Net connection manager:

    ' * object rawConnection = Dts.Connections["Sales DB"].AcquireConnection(Dts.Transaction);

    ' * SqlConnection myADONETConnection = (SqlConnection)rawConnection;

    ' * //Use the connection in some code here, then release the connection

    ' * Dts.Connections["Sales DB"].ReleaseConnection(rawConnection);

    ' *

    ' * Example of using a File connection manager

    ' * object rawConnection = Dts.Connections["Prices.zip"].AcquireConnection(Dts.Transaction);

    ' * string filePath = (string)rawConnection;

    ' * //Use the connection in some code here, then release the connection

    ' * Dts.Connections["Prices.zip"].ReleaseConnection(rawConnection);

    ' *

    #End Region

    #Region "Instance variables"

    ' we need access to the found file info from the FileSystemWatcher OnFileCreate event in our class

    ' scope. an instance variable may look odd but will do for our purposes

    Private foundFile As FileInfo = Nothing

    #End Region

    #Region "Method: Sub Main()"

    ''' <summary>

    ''' This method is called when this script task executes in the control flow.

    ''' Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.

    ''' To open Help, press F1.

    ''' </summary>

    Public Sub Main()

    Try

    ' initialize common variables from DTS variables collection

    Dim dropDirectory As String = Dts.Variables("User::WatcherInputDropPath").Value.ToString()

    Dim fileMask As String = Dts.Variables("User::WatcherInputFileMask").Value.ToString()

    Dim includeSubdirectories As Boolean = Convert.ToBoolean(Dts.Variables("User::WatcherInputIncludeSubdirectories").Value)

    ' look for existing files if configuration suggests we should

    Dim findExistingFiles As Boolean = Convert.ToBoolean(Dts.Variables("User::WatcherInputFindExistingFiles").Value)

    If findExistingFiles Then

    FindExistingFile(dropDirectory, fileMask, includeSubdirectories)

    End If

    ' do we (still) need to look for a file?

    If foundFile Is Nothing Then

    ' if we made it here there were no existing files to process (or we didn't check for them per the

    ' configuration variables) so setup a FileSystemWatcher object per the configuration variables

    ' bool timeoutAsWarning = true;

    Dim timeoutAsWarning As Boolean = Convert.ToBoolean(Dts.Variables("User::WatcherInputTimoutAsWarning").Value)

    Dim timeoutSeconds As Integer = Convert.ToInt32(Dts.Variables("User::WatcherInputTimeoutSeconds").Value)

    Dim timeoutMilliseconds As Integer = (If(timeoutSeconds = 0, -1, timeoutSeconds * 1000))

    WatchForFileCreation(dropDirectory, fileMask, includeSubdirectories, timeoutAsWarning, timeoutMilliseconds)

    End If

    Dts.TaskResult = CInt(ScriptResults.Success)

    Catch e As Exception

    Dts.Events.FireError(0, Nothing, e.Message, String.Empty, 0)

    Dts.TaskResult = CInt(ScriptResults.Failure)

    End Try

    End Sub

    #End Region

    #Region "Event: void OnFileCreate(object source, FileSystemEventArgs e)"

    ''' <summary>

    ''' Event attached to FileSystemWatcher when a file is created.

    ''' </summary>

    ''' <param name="source">Event source.</param>

    ''' <param name="e">Event arguments.</param>

    Private Sub OnFileCreate(source As Object, e As FileSystemEventArgs)

    PreProcessFoundFile(New FileInfo(e.FullPath))

    End Sub

    #End Region

    #Region "Method: WatchForFileCreation"

    ''' <summary>

    ''' Sets up a FileSystemWatcher to watch for new files being created.

    ''' </summary>

    ''' <param name="dropDirectory">Directory to watch</param>

    ''' <param name="fileMask">File pattern mask of files being watched for.</param>

    ''' <param name="includeSubdirectories">If true all subdirectories are also watched.</param>

    ''' <param name="timeoutAsWarning">If true then if watcher times out only a warning is raised, i.e. the Task succeeds.</param>

    ''' <param name="timeoutMilliseconds">Number of milliseconds to wait for a file to be initially created. This timeout period

    ''' does not apply to the tiem spent waiting for exclusive access to be gained to the file.</param>

    Private Sub WatchForFileCreation(dropDirectory As String, fileMask As String, includeSubdirectories As Boolean, timeoutAsWarning As Boolean, timeoutMilliseconds As Integer)

    ' create a new FileSystemWatcher

    Dim fileSystemWatcher As New FileSystemWatcher()

    ' set the path to watch to our 'drop directory'

    fileSystemWatcher.Path = dropDirectory

    ' set the option to watch subdirectories

    fileSystemWatcher.IncludeSubdirectories = includeSubdirectories

    ' set the filter of files to watch for to our 'file mask'

    fileSystemWatcher.Filter = fileMask

    ' add event handler to execute when new files are created

    AddHandler fileSystemWatcher.Created, AddressOf Me.OnFileCreate

    ' begin watching

    fileSystemWatcher.WaitForChanged(WatcherChangeTypes.Created, timeoutMilliseconds)

    If foundFile Is Nothing Then

    ' the file watcher timed out waiting for a file :-<

    Dim message As String = (Convert.ToString((Convert.ToString("Timeout waiting for file {Path='") & dropDirectory) + "'; Filter='") & fileMask) + "'; IncludeSubdirectories=" + includeSubdirectories.ToString() + "}."

    If timeoutAsWarning Then

    ' only raise a warning

    Dts.Events.FireWarning(0, Nothing, message, String.Empty, 0)

    Else

    ' raise an error

    Throw New TimeoutException(message)

    End If

    End If

    End Sub

    #End Region

    #Region "Method: void PreProcessFoundFile(FileInfo dataFile)"

    ''' <summary>

    ''' Takes actions subsequent to locating a file that allow later processing of the file. This method

    ''' reports information to the parent container by firing info events. This method also ensures exclusive

    ''' access to the file can be achieved before returning control to the parent container.

    ''' </summary>

    ''' <param name="dataFile">File to preprocess.</param>

    Private Sub PreProcessFoundFile(dataFile As FileInfo)

    ' set the instance variable value to the found file

    Me.foundFile = dataFile

    ' local variable to pass to events that require parameters be passed by ref

    Dim fireAgain As Boolean = True

    ' raise an information event saying we found a file (not necessarily that it can be used)

    Dts.Events.FireInformation(0, Nothing, "File found: " + dataFile.FullName, String.Empty, 0, fireAgain)

    ' We know there is a new file that can be processed because

    ' the FileSystemWatcher fired an event, however we do not know if the user or process

    ' supplying the file has completed uploading it. We will loop over drop directory

    ' looking for files that meet our criteria and once we find one we will make sure

    ' the supplier has completed their upload process by checking to see if we can gain

    ' exclusive access to the file. Once we can gain exclusive access to the file we will know

    ' the upload is complete and we can allow the rest of the SSIS package to continue.

    WaitForExclusiveAccess(dataFile)

    ' store the full file name (includes path) in output variable

    Dts.Variables("User::WatcherOutputFileFullName").Value = dataFile.FullName

    ' store the file name in output variable

    Dts.Variables("User::WatcherOutputFileName").Value = dataFile.Name

    ' raise an information event saying we found a file -and- it can be used

    Dts.Events.FireInformation(0, Nothing, "File ready for use: " + dataFile.FullName, String.Empty, 0, fireAgain)

    End Sub

    #End Region

    #Region "Method: void WaitForExclusiveAccess(FileInfo dataFile)"

    ''' <summary>

    ''' Waits until exclusive access to a file can be achieved.

    ''' </summary>

    ''' <param name="dataFile">File to access.</param>

    Private Sub WaitForExclusiveAccess(dataFile As FileInfo)

    ' local variable to say how many seconds to wait in between checking if we can gain

    ' exclusive access to the found file

    Dim secondsToWaitBetweenAttempts As Integer = 5

    ' local variable to pass to events that require parameters be passed by ref

    Dim fireAgain As Boolean = True

    ' Loop indefinitely checking if we can access the data file.

    While 1 = 1

    Try

    ' Attempt to gain access to the file.

    Using stream As Stream = New FileStream(dataFile.FullName, FileMode.Open)

    ' If we made it here no exception was thrown meaning we

    ' could access the file. We will break out of the loop and allow

    ' the rest of the package to continue processing.

    Exit Sub

    End Using

    ' We are not interested in ending the program when an IOException

    ' occurs in this area. This type of exception means we could not

    ' gain access to the file.

    ' In general, programming algorithms that leverage exceptions for

    ' control flow are frowned upon. However in the case of file access

    ' it is an acceptable pattern.

    Catch generatedExceptionName As IOException

    ' raise an information event saying we could not gain exclusive access to the found file and will wait

    Dts.Events.FireInformation(0, Nothing, "Could not gain exclusive access to file " + foundFile.FullName + ". Waiting " + secondsToWaitBetweenAttempts.ToString() + " seconds before trying again...", String.Empty, 0, fireAgain)

    ' wait some time before checking whether the file can be used

    Thread.Sleep(secondsToWaitBetweenAttempts * 1000)

    End Try

    End While

    End Sub

    #End Region

    #Region "Method: void FindExistingFile(string directoryName, string fileMask, bool includeSubdirectories)"

    ''' <summary>

    ''' Check a directory for files that match a file mask.

    ''' </summary>

    ''' <param name="directoryName">Directory to look for files.</param>

    ''' <param name="fileMask">File pattern mask matching files to look for.</param>

    ''' <param name="includeSubdirectories">True if subdirectories should also be checked.</param>

    Private Sub FindExistingFile(directoryName As String, fileMask As String, includeSubdirectories As Boolean)

    ' local variable to pass to events that require parameters be passed by ref

    Dim fireAgain As Boolean = True

    ' get the list of files that qualify

    Dim directoryInfo As New DirectoryInfo(directoryName)

    Dim fileInfos As FileInfo()

    If includeSubdirectories Then

    fileInfos = directoryInfo.GetFiles(fileMask, SearchOption.AllDirectories)

    Else

    fileInfos = directoryInfo.GetFiles(fileMask, SearchOption.TopDirectoryOnly)

    End If

    ' check to see if any files were found

    If fileInfos.Length > 0 Then

    ' found a file!

    PreProcessFoundFile(fileInfos(0))

    ' raise an info message

    Dts.Events.FireInformation(0, Nothing, "Existing files found: " + fileInfos.Length.ToString(), String.Empty, 0, fireAgain)

    Else

    ' no files found, raise a warning

    Dts.Events.FireWarning(0, Nothing, "No existing files found.", String.Empty, 0)

    End If

    End Sub

    #End Region

    #Region "ScriptResults declaration"

    ''' <summary>

    ''' This enum provides a convenient shorthand within the scope of this class for setting the

    ''' result of the script.

    '''

    ''' This code was generated automatically.

    ''' </summary>

    Private Enum ScriptResults

    Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success

    Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure

    End Enum

    #End Region

    End Class

    End Namespace

Viewing 15 posts - 46 through 60 (of 62 total)

You must be logged in to reply to this topic. Login to reply