March 24, 2010 at 3:33 am
Hi all,
First time here so please bear with me,
We have recently received a data dump from one of our suppliers of over 2,000 CSV files.
The first 19 fields are always the same but after that the data is specific to each file.
I have routines in place that can stitch the data together again once its been loaded, but my problem is so far i have had to manually load the files one by one.
There is the potential that i will have a further 8,000 files so idealy im looking for a SSIS Loop solution if possible!
So to recap:
Im running SQL 2005, with full access.
Im looking to import a large voume of CSV files in different formats, each into their own table inside my database
If they were the same format i would usually use a for...each loop but i am not sure how that would be applicable here.
Thanks for reading,
please post if you can!
March 24, 2010 at 3:46 am
Well, for the first 19 you can use a for each loop.
But you can't use a flat file source/destination if your columns change each time, so you're stuck with scripting.
One alternative is to read the CSV files with a minimum of columns and then split out the remaining columns in your data flow. But, you still will have to find a way to make the destination flexible.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 24, 2010 at 3:54 am
Sadly the "interesting" data is column 20 onwards.
I suppose what im looking for is something that will import a file that may contain 1-x columns and import them all as varchar(255)...
is that doable?
A script to work out how many columns are in the file etc then tell the data flow that it should expect that many?
I know im clutching at straws but i dont want to load 8,000 files manually... or inflict it on some poor temp!
March 24, 2010 at 4:04 am
I think your best option is to use a script task.
This script task will open a file, read a line and split that line into multiple strings (according to the delimiter).
Store these strings in a string array.
Then write some dynamic SQL that creates a table from scratch, with as much columns as there are in the array.
Then write the lines you've read to that table.
This will generete a table for each file. A problem is that you can't generate really useful names for those columns.
I think it will take some coding effort, but on the other hand, loading 2000 csv files will take more time 🙂
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 24, 2010 at 4:16 am
I like that idea a lot,
the files all contain header records, so could i not use aspects of the array to produce the column headers?
I think i will have some reading up to do as arrays are not something i have used in SSIS yet... 🙂
March 24, 2010 at 4:25 am
Ah, yes, you could store the first line of the file in a seperate array and use that one to create the column titles.
It is not really using arrays in SSIS, but rather using arrays in VB.NET (or C#.NET, depending on the version you are using.) Luckily that is a well documented part on the internet 🙂
But be careful, debugging a .NET script task in SSIS is a real pain, since there aren't any real debugging features...
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 24, 2010 at 4:31 am
I can foresee a few sleepless nights ahead then!
Many thanks for your suggestions on this,
If anyone reading this just happens to have a code example in their back pocket, it would be much appreciated, if not i will be posting my solution (if it ever works) here for future reference 🙂
March 24, 2010 at 10:38 am
da-zero (3/24/2010)
--But be careful, debugging a .NET script task in SSIS is a real pain, since there aren't any real debugging features...
That is incorrect. Full debugging features are available for script tasks - I think you are getting confused with script components.
There is, perhaps, an alternative approach where a Foreach loop can be used.
How about importing the data as one record per column and then using T-SQL afterwards to dissect the imported results? Add an extra derived column (filespec) to allow alike records to be processed as a batch. Dealing with the column headers would be a pain though - still thinking about that one (maybe add an IDENTITY column on the destination table, then the header for each batch will be the one with the lowest identity value).
Just wanted to throw the idea into the mix in case it is of interest.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
March 24, 2010 at 11:00 am
Thanks for the input,
Still investigating all angles,
Might give up and create a BULK INSERT script instead, as that may just do what i want it too...
thanks again to all who commented
March 24, 2010 at 2:07 pm
Phil Parkin (3/24/2010)
That is incorrect. Full debugging features are available for script tasks - I think you are getting confused with script components.
Ah, then I'm indeed mistaken. Thanks for correcting me.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 24, 2010 at 4:30 pm
Another option is to load all data into a single column and write custom code to parse it, afterwards. Similar to the idea of loading it into an array, except that you are staging it in sql...
March 25, 2010 at 3:04 am
Tim Weintz (3/24/2010)
Another option is to load all data into a single column and write custom code to parse it, afterwards. Similar to the idea of loading it into an array, except that you are staging it in sql...
That looks remarkably similar to my suggestion ... :w00t:
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
March 25, 2010 at 10:21 am
Phil Parkin (3/25/2010)
Tim Weintz (3/24/2010)
Another option is to load all data into a single column and write custom code to parse it, afterwards. Similar to the idea of loading it into an array, except that you are staging it in sql...That looks remarkably similar to my suggestion ... :w00t:
Oh well. You know what they say, "Great minds..." 😛
I just reread your post, and I suppose mine was a bit redundant. I just had a similar problem where a vendor transmits several different formats in a single file. I loaded them into a single column, then parsed them into a dummy table (with the maximum number of fields) where my package could easily examine key elements to determine which layout a given record belonged to, before sending them into their respective destination tables. Headers weren't a problem in my case, since there were only n number of layouts and they could be identified by data within each record.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply