Looping through a database table and setting variables from values retrieved in SSIS

  • Hi guys,

    I am new to SSIS and am faced with the following challenge:

    I have an SSIS package that takes 2 parameters as inputs (input1: filename_men, input2: filename_women) and then processes the data. This package works fine. Its only drawback is that it currently requires us to manually input 2 variables (filename_men and filename_women) for every set of files we process.

    Often there are a lot of files to process and doing it manually takes a lot of time.

    I am trying to automate the process, and have ideas, but not sure how to implement them using SSIS.

    My plan is to

    1. Create a table with the following structure:

    create table t

    (

    id int not null identity(1,1),

    filename_men varchar(255) not null,

    filename_women varchar(255) not null,

    status varchar(30) not null

    ) on [primary]

    2. Populate the table with the file names, and set status to 'Unprocessed' for all of them initially

    3. Create a new SSIS package, which will

    - loop through the table 't'

    - pick up each row where status = 'Unprocessed'

    - store filename_men and filename_men in variables @filename_men and @filename_women, respectively

    - pass these variables into the original package ("Execute Package Task"?) so that it can execute

    - update the processed row in table 't' to have status = 'Processed'

    Step 3 (obviously) is a loaded step, and I am not sure how to accomplish it.

    Any help/direction would be appreciated.

    Thank you!

  • I'm a little fuzzy on your needs, but I think you may be looking to use a For Each Loop Container. An FELC is a task container that allows you to iterate through a collection (like a collection of files in a folder). The FELC allows you to specify the location of the files, along with a file name or wild card string (e.g., *.csv). As it iterates, it will then pass the name of the current file (under Variable Mappings) to any tasks contained within the container.

    For example, the FELC iterates through all files contained within the specified folder, and identifies each one with a name matching the wild card string. A Data Flow Task inside this FELC can then receive the name and path of the file via variables, connect to the file and load the data to the database.

    If the 'men' and 'women' files are processed differently, perhaps consider one of two options. First, if the file name contains a string that identifies which file type it is, add a Script Task to your FELC to parse out the substring and determine which file type it is, etc. If the file doesn't contain an identifying string, consider creating separate folders for men and women files (sounds very 1920's, doesn't it). Then have a different control flow, a different package, or a runtime flag of some sort to manage the different file types.

    You shouldn't need the file tables you're describing (unless it's a business requirement). A common mistake among SQLers (one that I've certainly made) is thinking you need the database objects when using SSIS. Another way of saying this is, challenge yourself to avoid using the Execute SQL Task. That task is not a bad thing in and of itself, but some of what you're describing can be done without involving the database, and your performance can be a lot better as a result.

    Google the following: control flow tasks (specifically, Script Task, File System Task), data flow task and data flow transformations, connection managers. If you need to record the status of your package or the files being imported, consider using SSIS logging, though it may not be specific enough to your needs.

    Take a look at the two attachments I posted. It's a fairly simple control flow that looks for files in a folder matching a string, then loads them to the db and archives them. The data flow task is slightly more complex, in that it massages the data before loading.

  • Hi goofbauer,

    Thank you for your suggestions.

    Perhaps my original post was a bit confusing.

    I currently have an SSIS package (let's call it SSIS 1) that reads a folder and loads men/women files into separate base tables.

    I also have an SSIS package (let's call it SSIS 2) that loads the data from 2 base tables into the fact/dimension tables (merging the data in the process based on the matching men/file names). Which men file matches which women file is currently specified by 2 variables set manullay in SSIS 2.

    Since I need to process 100 such pairs, I prefer not to set them manually each time, and run each time, but rather have the matching pairs stored in a table, and picked up, pair at a time, passed to SSIS 2, and processed.

    That is the ultimate goal for SSIS 3 (get pair, call SSIS 2 specifying the pair, mark pair is processed, and move on to next pair, until no more to process)

    I am currently trying to use For Loop, but am still struggling with the approach.

    Please advise.

    Thank you!

  • Ok, to be clear, let's address the For Loop issue first.

    I think you want a For Each Loop Container, not a For Loop Container. A For Loop is best for evaluating an expression, whereas a For Each Loop is for iterating through a collection. You will see in the properties of the For Each Loop Container that there are several properties tailored for looping through a collection of files. See the two most recent attachments I added to illustrate what I'm about to say.

    First, the Collection pane of the FELC properties editor allows you to specify a folder and file string to use. These values tell the package where the files are. Second, the Variable Mappings pane allows you to specify the variable that will hold the name of the found file. From there, any tasks contained within that container will have access to the file name and can then perform actions on the file. But keep in mind that the container is only outputing a string that identifies the name of the file. It is not holding the file object itself in memory.

  • On the men/women thing, do the file names contain a substring that identifies the gender contained within the file? And if so, is that substring consistent over time, so that it can be identified using a wild card string? For example, *_men.csv and *_women.csv.

  • I found that working with Fore Each Loop Container on files I needed better information about the file properties. CodePlex has a File Properties Task that you can download that make it easy to cature file properties and even change them. http://filepropertiestask.codeplex.com/

  • Hi goofbauer,

    I was able to accomplish my task!

    Thanks again for all the suggestions.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply