SSIS import of multiple, variably named CSV files

  • Hello experts,

    I'm working on a way to automatically import files given the following scenario. I have gotten to the point where I can do it manually in under 10 minutes, but my boss has reasonably requested that this be automated and scheduled.

    It seems that there are many ways to do this, but each one has its own wrinkles. At the moment, it seems like an SSIS package is the way to go, but SSIS has some elements that seem quite complex (bordering on byzantine when it comes to handling variables), so I need some help in formulating a design.

    Here is the scenario.

    1. Each morning, a script places 3 files in a share on a db server:

    File_Name1_MM-DD-YYYY-NN

    File_Name2_MM-DD-YYYY-NN

    File_Name3_MM-DD-YYYY-NN

    2. Additionally, at some point during the day, someone will request for one or more of the files to be re-run, so the folder contents might look like this on a given day:

    File_Name1_03-25-2016-10

    File_Name2_03-25-2016-10

    File_Name3_03-25-2016-10

    File_Name3_03-25-2016-11

    I know this is messy - for example, the NN can be numbers from 01 to 99 without a fixed pattern that I can determine - but for now it is what I am dealing with. If necessary I will ask the admin who has scripted the export (1) what the pattern for the last number is or (2) whether we can simplify the naming convention here. Also the files begin with completely different words, not the same name with a number appended.

    But my question for now is - within an SSIS package, I have found via Google searches, methods for looping over files, etc., but none for looping in a folder like the one in item 2 above. Is there a way I can create an SSIS package that does the following?

    A. Look in the folder for files that start with the patterns File_Name1_MM-DD-YYYY, File_Name2_MM-DD-YYYY, or File_Name3_MM-DD-YYYY

    B. For any files that are found, go back and see if there are multiple copies of files like that - for example the files File_Name3_03-25-2016-10 and File_Name3_03-25-2016-11.

    C. For files of the kind in B, use the file with the highest last number (File_Name3_03-25-2016-11)

    D. Once we have the files we need, proceed to import each file into its own separate table - I already have the tables and a way to handle populating them, so that is not an issue.

    Ideally, I want to get these files into SQL tables for processing as soon as possible because the CSV format drives me nuts. But I'd be grateful for any help you can provide. I just have a feeling I am overcomplicating my thinking on this and as a result am missing what could be a straightforward solution.

    Thanks!

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner (3/25/2016)


    Hello experts,

    ...

    1. Each morning, a script places 3 files in a share on a db server:

    File_Name1_MM-DD-YYYY-NN

    File_Name2_MM-DD-YYYY-NN

    File_Name3_MM-DD-YYYY-NN

    2. Additionally, at some point during the day, someone will request for one or more of the files to be re-run, so the folder contents might look like this on a given day:

    File_Name1_03-25-2016-10

    File_Name2_03-25-2016-10

    File_Name3_03-25-2016-10

    File_Name3_03-25-2016-11

    I know this is messy - for example, the NN can be numbers from 01 to 99 without a fixed pattern that I can determine - but for now it is what I am dealing with. If necessary I will ask the admin who has scripted the export (1) what the pattern for the last number is or (2) whether we can simplify the naming convention here. Also the files begin with completely different words, not the same name with a number appended.

    Hello yourself. Question the first. Are all files (regardless of name) the same format? Same # of columns? Same data types in the same order?

    If so, you can use a single For Each Loop container to search on the "File_Name" part of the file's name, with wildcards after. So it would be "File_Name*" in the Files section of the Collection. You'll want to store the file name in a variable, so make sure to retrieve what part of the file name you'll need (Fully qualified takes the entire path, file name, and extension where as Name only takes the name and Name and extension takes the name and the extension). Then flip over to Variable Mappings, assign a variable name and use the Index of 0 (meaning the first value) so it captures the actual name the Collection is grabbing.

    If the answer to the question is "no," then you're going to need to build a For Each Loop for each different type of file. Or maybe a separate SSIS package for each file type, depending on how you prefer to do your SSIS maintenance.

    After you've got this all set up, in the loop itself, you do your import task, using the variable to reference the file name for all other tasks.

    Does that make sense?

    EDIT: If you are "repeating" files, you'll want to add a column in your table for file name or some other reference so you can delete previously inserted records. I usually do this with the filename in the table, then have an Execute SQL task do the DELETE FROM MyTable WHERE FileName = ? (with the ? being mapped to a variable on one of the other task tabs).

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • You create table and stored procedure to do this. First populate the table with the name of all the files in the folder. Then call the stored procedure to do any kind of manipulation to get the file name as per your logic. Pass this file name to the SSIS to get the contents of the file. - Just a thought 😀

  • TheAccidentalDBA (3/28/2016)


    You create table and stored procedure to do this. First populate the table with the name of all the files in the folder. Then call the stored procedure to do any kind of manipulation to get the file name as per your logic. Pass this file name to the SSIS to get the contents of the file. - Just a thought 😀

    Honestly, that is a lot harder than using the native SSIS components to do the processing.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Thanks all for your replies! Especially given the crazy nature of the file setup. I think I have enough advice to figure out what to try (the idea of populating a table as TheAccidentalDBA suggested sounds promising to keep track of the files, as they all have the same CSV formats but different name patterns and column layouts).

    Thanks again!!

    I will post back with what I end up devising.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • I agree with Brandie here. All of this can be done in SSIS with the native components. The biggest challenge is file formats. You'll need a dataflow/loop/package for each file format. SSIS is very metadata sensitive so you have to create a process for each file type. But handling the naming changes can easily be done with the For Each container and expressions. If you like the idea of storing the file names in a table for auditing/history, you can do that via SSIS also as you are processing the files. One thing to maybe make it easier is to create sub folders for each file type so your for each container can just grab everything in it's respective sub-folder and you don't have to worry about creating the date formats with expressions. Once the package processes a file, move it out of the folder to an archive location.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply