Can you help me import 2,000+ different CSV's?

  • Hi all,

    First time here so please bear with me,

    We have recently received a data dump from one of our suppliers of over 2,000 CSV files.

    The first 19 fields are always the same but after that the data is specific to each file.

    I have routines in place that can stitch the data together again once its been loaded, but my problem is so far i have had to manually load the files one by one.

    There is the potential that i will have a further 8,000 files so idealy im looking for a SSIS Loop solution if possible!

    So to recap:

    Im running SQL 2005, with full access.

    Im looking to import a large voume of CSV files in different formats, each into their own table inside my database

    If they were the same format i would usually use a for...each loop but i am not sure how that would be applicable here.

    Thanks for reading,

    please post if you can!

  • Well, for the first 19 you can use a for each loop.

    But you can't use a flat file source/destination if your columns change each time, so you're stuck with scripting.

    One alternative is to read the CSV files with a minimum of columns and then split out the remaining columns in your data flow. But, you still will have to find a way to make the destination flexible.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Sadly the "interesting" data is column 20 onwards.

    I suppose what im looking for is something that will import a file that may contain 1-x columns and import them all as varchar(255)...

    is that doable?

    A script to work out how many columns are in the file etc then tell the data flow that it should expect that many?

    I know im clutching at straws but i dont want to load 8,000 files manually... or inflict it on some poor temp!

  • I think your best option is to use a script task.

    This script task will open a file, read a line and split that line into multiple strings (according to the delimiter).

    Store these strings in a string array.

    Then write some dynamic SQL that creates a table from scratch, with as much columns as there are in the array.

    Then write the lines you've read to that table.

    This will generete a table for each file. A problem is that you can't generate really useful names for those columns.

    I think it will take some coding effort, but on the other hand, loading 2000 csv files will take more time 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I like that idea a lot,

    the files all contain header records, so could i not use aspects of the array to produce the column headers?

    I think i will have some reading up to do as arrays are not something i have used in SSIS yet... 🙂

  • Ah, yes, you could store the first line of the file in a seperate array and use that one to create the column titles.

    It is not really using arrays in SSIS, but rather using arrays in VB.NET (or C#.NET, depending on the version you are using.) Luckily that is a well documented part on the internet 🙂

    But be careful, debugging a .NET script task in SSIS is a real pain, since there aren't any real debugging features...

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I can foresee a few sleepless nights ahead then!

    Many thanks for your suggestions on this,

    If anyone reading this just happens to have a code example in their back pocket, it would be much appreciated, if not i will be posting my solution (if it ever works) here for future reference 🙂

  • da-zero (3/24/2010)


    --

    But be careful, debugging a .NET script task in SSIS is a real pain, since there aren't any real debugging features...

    That is incorrect. Full debugging features are available for script tasks - I think you are getting confused with script components.

    There is, perhaps, an alternative approach where a Foreach loop can be used.

    How about importing the data as one record per column and then using T-SQL afterwards to dissect the imported results? Add an extra derived column (filespec) to allow alike records to be processed as a batch. Dealing with the column headers would be a pain though - still thinking about that one (maybe add an IDENTITY column on the destination table, then the header for each batch will be the one with the lowest identity value).

    Just wanted to throw the idea into the mix in case it is of interest.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks for the input,

    Still investigating all angles,

    Might give up and create a BULK INSERT script instead, as that may just do what i want it too...

    thanks again to all who commented

  • Phil Parkin (3/24/2010)


    That is incorrect. Full debugging features are available for script tasks - I think you are getting confused with script components.

    Ah, then I'm indeed mistaken. Thanks for correcting me.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Another option is to load all data into a single column and write custom code to parse it, afterwards. Similar to the idea of loading it into an array, except that you are staging it in sql...

  • Tim Weintz (3/24/2010)


    Another option is to load all data into a single column and write custom code to parse it, afterwards. Similar to the idea of loading it into an array, except that you are staging it in sql...

    That looks remarkably similar to my suggestion ... :w00t:

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (3/25/2010)


    Tim Weintz (3/24/2010)


    Another option is to load all data into a single column and write custom code to parse it, afterwards. Similar to the idea of loading it into an array, except that you are staging it in sql...

    That looks remarkably similar to my suggestion ... :w00t:

    Oh well. You know what they say, "Great minds..." 😛

    I just reread your post, and I suppose mine was a bit redundant. I just had a similar problem where a vendor transmits several different formats in a single file. I loaded them into a single column, then parsed them into a dummy table (with the maximum number of fields) where my package could easily examine key elements to determine which layout a given record belonged to, before sending them into their respective destination tables. Headers weren't a problem in my case, since there were only n number of layouts and they could be identified by data within each record.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply