February 8, 2013 at 1:28 am
Hi,
I have an ever increasing number of files in a single directory (will probably end up with hundreds). There's 5 different types of files, denoted by their file name. For example, I may have (for each file typ):
ABCZZZZ123.CSV
DZZZZ872.CSV
123999_ZZZZ.CSV
87182XXXXASDFA.CSV
DX11XXXXA1.CSV
D21ZZXXXX.CSV
All files-of-type have identical structures/column headers, etc. That is, all ZZZZ files are identical, all XXX files are identical, etc.
I need to load all of these files, by type, into SQL Server. I have created a table for each of the different file types. I can successfully process a single file (explicitly specifying its name) into its table.
So, to load all csv files, I guess I need a ForEachLoop container in SSIS, and some script to split the file name to pick out the type of file it is? My SSIS skills are pretty dire, and I don't really know how best to do this.
Any guidance/solution would be gratefully received.
Thanks,
Andrew
February 8, 2013 at 7:33 am
Not sure if it'll work but it's something to try...
Try the Conditional Split Task.
Capture the file name and parse out the file type in the loop and store in variables.
Then set the conditions of the splits to the type of file.
set the destination for each condition path.
Probably need to set the DelayValidation to True and Mapping might be an issue.
Probably have to map each file type one at a time first and live with the red dot for a bit.
I think I've done something like this before with the Condition split but it's been 6 years....
---------------------------------------------------------------
Mike Hahn - MCSomething someday:-)
Right way to ask for help!!
http://www.sqlservercentral.com/articles/Best+Practices/61537/
I post so I can see my avatar :hehe:
I want a personal webpage 😎
I want to win the lotto 😀
I want a gf like Tiffa :w00t: Oh wait I'm married!:-D
February 9, 2013 at 8:08 am
- removed -
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 10, 2013 at 7:58 am
I re-read the original post this morning and realized I had misread an important requirement so I removed my last post which was way off the mark.
I do not think a Conditional Split is necessary here. Since all files have the same structure you can have a ForEach Loop Container (FELC) with a single Data Flow Task. The Data Flow Task will contain a Flat File Source and an OLE DB Destination. The Flat File Source will refer to a Flat File Connection Manager that will build it's ConnectionString using the Variable the FELC uses to store the file name. The Data access mode of the OLE DB Destination will be Table name or view name variable. The Variable it points to will have EvaluateAsExpression set to True and will use the file name Variable in its Expression to determine the table name.
e.g. this expression says, if the filename contains ZZZ then return TableZZZ, else if it contains XXX return TableXXX, else return TableDefault. You can expand this as necessary just use parentheses liberally to ensure it remains organized
FINDSTRING( @[User::FileNameVariable], "ZZZ", 1 ) > 0 ? "TableZZZ" : (FINDSTRING( @[User::FileNameVariable], "XXX", 1 ) > 0 ? "TableXXX" : "TableDefault")
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 10, 2013 at 2:27 pm
adb2303 (2/8/2013)
Hi,I have an ever increasing number of files in a single directory (will probably end up with hundreds). There's 5 different types of files, denoted by their file name. For example, I may have (for each file typ):
ABCZZZZ123.CSV
DZZZZ872.CSV
123999_ZZZZ.CSV
87182XXXXASDFA.CSV
DX11XXXXA1.CSV
D21ZZXXXX.CSV
How do you intend to keep track of which files were loaded and which were not?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 10, 2013 at 2:43 pm
The users will be responsible for clearing out the directory these files are stored in. My instructions are to load whatever's in that directory every time the solution runs, even if the same file gets loaded over and over again. I have a process in place within the SSIS package to archive/de-duplicate data.
February 10, 2013 at 4:49 pm
adb2303 (2/10/2013)
My instructions are to load whatever's in that directory every time the solution runs, even if the same file gets loaded over and over again.
Wow. Whoever wrote those instructions has a serious hooie problem. If the database is designed correctly, it won't let you make such a mistake as what they're askig for.
Since you don't have to actually move files, rename them, or anything else that requires any real power, why not simplify all of this just by using BULK INSERT right straight from some scheduled T-SQL stored procedures? After all, the data is just some consitently formatted CSV files, isn't it?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 10, 2013 at 5:22 pm
SSIS OLE DB Destinations are using the same API that BULK INSERT would. The only difference here is that SSIS can access the file system natively whereas xp_cmdshell would have to be enabled if wanting to do the same using T-SQL, a clear disadvantage imho.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 10, 2013 at 6:11 pm
opc.three (2/10/2013)
SSIS OLE DB Destinations are using the same API that BULK INSERT would. The only difference here is that SSIS can access the file system natively whereas xp_cmdshell would have to be enabled if wanting to do the same using T-SQL, a clear disadvantage imho.
Absolutely not true, Orlando. There is absolutely no need of xp_CmdShell for this given "if it's there, load it" circumstance.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 10, 2013 at 6:20 pm
My understanding is the file names, while showing similar patterns, change from day to day.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 10, 2013 at 6:40 pm
opc.three (2/10/2013)
My understanding is the file names, while showing similar patterns, change from day to day.
Understood and agreed. But, you don't need xp_CmdShell to read those file names. Just use xp_DirTree and Bob's your uncle. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 10, 2013 at 8:38 pm
I could be wrong because I do not do much of this type of programming but I think xp_dirtree will only get you the list of directories however not the files within them. I think to get the list of files within a directory you'd need to use xp_cmdshell.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 10, 2013 at 10:43 pm
Heh... you just have to know that I wouldn't say such a thing unless I knew. Please try the following command. The thing that most people don't know about xp_DirTree is that it actually takes 3 parameters. If the third parameter is something other than "0", it will list the file names along with an extra column to identify what is a file and what is not.
EXEC xp_DirTree 'C:\',1,1
And, yes... I'm in the process of writing an article on it's use in this area.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 11, 2013 at 6:51 am
Nice. I figured there was something else up your sleeve otherwise you would not have said anything. That could be handy for some as it relates to their ability to enable xp_cmdshell in their environment although you know how I feel about accessing the file system from T-SQL regardless of the means 😉 It's a shame it is officially an undocumented procedure. I maybe could see myself using it for some discovery work in a pinch (if PowerShell remote management, Remote Desktop and xp_cmdshell, yes I use it in a pinch from time to time although I shower as soon as possible afterward, were all unavailable) but could not see myself using it in deployed production code.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply