An SSIS ETL package created with the Import/Export Wizard will process one data file at a time. There may be situations, however, when multiple data files will be arriving continuously for extraction to the same table. A base SSIS package generated using the Import/Export Wizard can be modified to process these files in batch mode.
In this tutorial we will take an SSIS package generated by the Import/Export wizard and modify it to process multiple files and move those files to a backup directory.
1: Download the tutorial files
The zip file attached to this tutorial contains a default SSIS package (BaseBatchETL.dtsx) generated by the Import/Export Wizard, the same SSIS package modified to process multiple files in batch mode (BatchETL.dtsx), a SQL script (CreateCustomersTable.sql) for generating the ETL destination table, and six data files (NewCustomersXX.txt) for extraction into the destination table.
2: Create the destination table with the CreateCustomersTable.sql script
Open the CreateCustomersTable.sql script in SQL Server Management Studio.
USE [TestDB] GO /****** Object: Table [dbo].[customers] Script Date: 05/10/2013 15:46:30 ******/SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[customers]( [FirstName] [nvarchar](100) NULL, [LastName] [nvarchar](100) NULL, [Company] [nvarchar](100) NULL, [Address] [nvarchar](100) NULL, [City] [nvarchar](100) NULL, [County] [nvarchar](100) NULL, [State] [nvarchar](100) NULL, [ZIP] [nvarchar](100) NULL, [Phone] [nvarchar](100) NULL, [Fax] [nvarchar](100) NULL, [Email] [nvarchar](100) NULL, [Web] [nvarchar](100) NULL ) ON [PRIMARY] GO
Execute the script to create the destination table.
Browse to the new table to confirm that it was created.
3: Create source and backup folders and place the data files in the source folder
Create the folders C:\BatchETL and C:\BatchETL\backup and copy the six data files to the C:\BatchETL folder.
4: Open, configure and execute BaseBatchETL.dtsx in Business Intelligence Development Studio
Open the BaseBatchETL.dtsx SSIS package in BIDS to show that it consists of a Data Flow Task generated by the Import/Export Wizard.
Double-click the DestinationConnectionOLEDB connection manager to bring up the Connection Manager configuration panel, change the server and database names to the server and database where you created the customers table and click the OK button.
Double-click on the SourceConnectionFlatFile connection manager to show that the connection manager is pointing to the NewCustomers01 data file in the C:\BatchETL\ folder.
Execute the BaseBatchETL package.
Confirm that the 5000 records in the NewCustomers01.txt file have been added to the customers table.
Truncate the customers table to prepare for execution of the BatchETL.dtsx package we will create from the BaseBatchETL.dtsx package.
5. Add a Foreach Loop Container to BaseBatchETL.dtsx
Drag-and-drop a Foreach Loop Container from the Toolbox into the Control Flow panel.
Drag the Data Flow Task into the Foreach Loop Container.
6. Configure the Foreach Loop Container
Click on the Variables tab and ad the global variable FilePath.
Double-click on the Foreach Loop Container to bring up the Foreach Loop Editor, then click on the Collection node.
Change the contents of the Folder textbox to C:\BatchETL\ and the Files textbox to NewCustomers*.txt, then click on the Variable Mappings node.
Select the Variable User::FilePath and set the Index to zero, then click the OK button to close the Foreach Loop Editor.
7. Configure the SourceConnectionFlatFile connection manager
Click on the SourceConnectionFlatFile connection manager to bring it up in the properties window. Click on the button next to the Expressions parameter...
...to bring up the property expressions editor. Set the Connection string property to the FilePath global variable we created earlier, then click the OK button to return to the main editing window.
At this point the package will process multiple files. We are now going to add a script task to move each data file to backup as it is extracted.
8. Add a script task for moving the data files to backup
Drag-and-drop a script task from the toolbox into the Foreach Loop Container.
Add a precedence constraint from the data flow task to the script task.
Double-click on the script task to bring up the Script Task Editor. Add the read-only variable FilePath and click the OK button.
Click on the Edit Script button to bring up the script task code editor and enter the following Visual Basic code.
Option Strict On Imports System Imports System.IO Imports System.Data Imports Microsoft.SqlServer.Dts.Runtime <System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _ <System.CLSCompliantAttribute(False)> _ Partial Public Class ScriptMain Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase Enum ScriptResults Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure End Enum Public Sub Main() Dim DestinationDirectory As String = "C:\BatchETL\backup\" Dim file_name As String() = Split(Dts.Variables("FilePath").Value.ToString, "\") Dim input_file As New FileInfo(Dts.Variables("FilePath").Value.ToString) Try input_file.MoveTo(DestinationDirectory & file_name(2)) Catch ex As Exception End Try End Sub End Class
After entering the code click the button to save it, then close the window.
9. Execute the finished package
Click on the button to execute the package to process all the files in the c:\BatchETL\ folder.
Open the customers table to confirm that the 30,000 records in the six data files have been inserted.
Open the C:\BatchETL\ folder to confirm that the data files have been removed from the source folder.
Open the C:\BatchETL\backup\ folder to confirm that the data files were moved to the backup folder.
Summary
To configure a data flow task created by the SSIS Import/Export wizard to process mutliple data files:
- Add a Foreach Loop Container to the Control Flow window of the SSIS package
- Move the data flow task into the Foreach Loop Container
- Add a global string variable named FilePath to the package
- Configure the Foreach Loop Container Collection-Folder text box to point to the the folder where the data files will reside
- Configure the Foreach Loop Container Collection-Files text box to contain the appropriate root name of the data files
- Configure the Foreach Loop Container Variable Mappings-Variable to the FilePath variable and give it an index of zero
- Configure the ConnectionString property of the SourceConnectionFlatFile connection manager to the FilePath variable
- Add a script task to move the data files to a backup folder so that the same files will not be extracted multiple times