Load in different table base on different number of columns in text file.

  • I have 1000 text files in one folder, text files have 2 or 3 or 4 column. Now I need to load those text file in sql server tables, in respective table with 2 or 3 or 4 columns. I do know how to use for each loop, it work if I have same number of column and destination is only one table. Here number of column in source and destination is different.

  • Munabhai (7/9/2013)


    I have 1000 text files in one folder, text files have 2 or 3 or 4 column. Now I need to load those text file in sql server tables, in respective table with 2 or 3 or 4 columns. I do know how to use for each loop, it work if I have same number of column and destination is only one table. Here number of column in source and destination is different.

    Is there anything about the file names which allows you to identify what sort of file you are dealing with? Or do you have to edit the file before you can tell?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Case 1. when you say that you have files with 2/3/4 columns, does it means files with 2 columns have always same 2 columns, files with 3 columns have always same 3 columns and files with 4 columns have always same 4 columns?

    Case 2. Is there any distinct identifier in the file name for these files with 2/3/4 columns?

    If case 1 is true, then you can seggregate files with 2/3/4 columns in seperate directory & create 3 for each loop with one data flow task in each.

    If case 2 is true, you can create 3 data flow tasks in single for each loop and then you can retrieve the file name in a variable and based on condition, you can move the control to load the file into one of the 3 data flow tasks..

    :rolleyes:

  • Comment removed - it was wrong!

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • No I do not have to do any transfermation. Its stright forward only load

  • Yes, there is fix number of column, but as you suggest in Case1. How do I seperate cause just looking file name I do not know the number of column. If I would know number of column by looking file name then as you said I could do.

    As you said in case2, no there is no distinct file name base on column name.

  • This may seem like a quite circuitous solution, but it just may work...

    This is not a step by step solution but more of an abstract...

    Loop through your files and bring the first record in - ignoring column delimiters - as a single column record and perform the following:

    Use your choice of transform (derived column, script task, etc.) to count the number of column delimiters there actually are in the record.

    So if you were using pipes as delimiters, you could count the number of pipes in the record - 1 would mean it's a 2 column, 2 means a 3 column and 3 pipes is a 4 column.

    Based on the outcome, move the original file to a directory specifically for files with that count of delimiters - i.e. 1, 2 , or 3 (representing your 2, 3, or 4 columns respectively).

    Continue looping until all files have been moved.

    This should have identified files have which number of columns and you should be able to loop through each of those sets of files and process the data as you need to for 2, 3, and 4 columns respectively.

    Hope this can help you get to your solution - if you have any questions let me know!

  • that sound good idea.

    I will try that and let you know

  • Hello sir, I could load data in single column in table. but I need to seperate comma

    select empFirstName,len(empFirstName) - len(replace(empFirstName, ',', '')) as 'Noofcomma'

    from [dbo].[test]

    I can load data in three different table using where Noofcomma=2/3/4, but still I need to split those comma to load in final tables.

    As you said, I try to use derived column but I do not know use.

  • I think you may have misunderstood what I was suggesting just a little bit.

    I suggest that you use SSIS to organize the data in the files into the 2/3/4 "containers" first before you load any data into any tables.

    You'll need to create a variable to hold the column count in your files, for now we'll make it package level and call it varColumnCount (int data type). You'll also need to create a few more package level variables - 1 to hold the original filename (varOrgFile), 1 for the path to your original directory (varOrdDir), and 1 for the path to your 2/3/4 directories (varDestDir). Seed these variables with real default values if possible.

    You'll need to create a 1 Foreach Loop and put a data flow task and file system task inside the Foreach Loop. You'll also need to set the

    value of the User::varOrgFile variable in the Foreach Loop.

    For each of your intial files use the data flow task to bring the entire record into the dataflow as one column - I'll call it WholeRecord for now.

    To simplify these suggestions - and since I don't have SSIS in front of me to verify/test things - I'll not suggest you bring in the first record of each file for now. Go ahead and bring in all the records from the files like you normally would.

    Then use a derived column transformation to get the count of delimters (sounds like comma in your case). In the Derived column transformation write an expression similar to what your select statement looks like - something like this (I have not tested syntax nor results of the following expression - I'm also sure there are other expressions you could use to get the count of delimiters):

    (len([WholeRecord]) - len(replace([WholeRecord], ",", "")) + 1

    Select the user::variable to assign the results of the expression to - i.e. User::varColumnCount

    Direct the ouput from this Derived Column transform to the new one you are about to create...

    Create another Derived Column transform and use experssions and your variables to format the path to the 2/3/4 destiantion directories - something similar to:

    user::varOrgDir + "\" + user::varColumnCount

    Select the user::variable to assign the results of the expression to - i.e. User::varDestDir

    Direct output of the Derived Column transform to a Recordset Destination or something similar - we don't care about this data yet, it served our purpose and we got the count of columns.

    Back in the Control Flow - in the Foreach loop, use the File System Task to copy the file to the destination 2/3/4 directory. Use Expressions and your variables to determine which file to move (varOrgFile and varOrgDir) and where to move it (varDestDir and varOrgFile).

    Now create 3 more Foreach loops - 1 each that will use a data flow task to loop through and process the files for each of your 2/3/4 directories.

    That's about as much suggestion I can make from here on out, I hope it gives you enough to work with - or for others to read and suggest improvements 🙂

    Good luck!

  • Thanks for your help.

    I am very close as you suggest last time.

    I did use for each loop and load all data in one table which have one column.

    In T-sql, I count number of , (comma delimitar) and load in 3 respective tables.

    To grab, those individual value, I used charindex and substring and finally load in right table.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply