Loading Multiple flat file with SSIS.

  • Hi all,

    I need to design a SISS for uploading between 5-6 flat files each month.

    These flat files are the same format however each has around 1 million records.

    what i have done so far.

    1. Back up database task

    2. Execute SQL task to check if the staging table already exists if not create one.

    3. Create a foreach loop to loop through all the *.txt files in the target folder eg:- c:\Project\Upload

    Now i need help designing a soution to upload each 1 million record text file into the Staging table.

    Advise sought here is how to achieve this for lets say 7-8 flat files with upto 10 millions customer records. I also need to record quantities of each for into a logSummary table.

    I assume using a Data flow task within the foreach loop where we have a Flat file SOurce(each falt file) and OLE DB destination (stagingTable) and derived tranform in between will take just too much time.

    In fact this is what i am doing right now and finding the whole process too slow, was wondering if you have any better ideas.

    Format of each text file.

    Cust_ID, FullName, Address1, Address2, city,state, postcode, Firstname,Surname,Title

    Fields in the staging table

    Emloyee_ID, Source,Emp_Status,Customer_Name, Address1,Address2,city,state, postcode, pruchase,product,product_Source

    note source is the field poulated with value 'EMPLOYEE' since all the Flat file have Emloyee info.

  • You didn't mention what you're doing with the Derived Column transform for each data file. What is it that this transform is doing? How long does it take to load the 10m rows or so?

  • Derived Transform is used to assign Source field with "Employee" value.

    Also a derived column called Upload Date is there with the GETDATE() value plus there are few other derived columns populated with values dependent on the Flat file i am extracting data from.

    Have also used a dataconversion tranform as to convert data (Dt_STR) from flatfiles to their correspoding integer or bit datatype (Not able to reslove this issue yet inspite of using the dataconversuion transform) fields in the staging table, have also come across truncation issues where DT_STR is (50) from the flat file but the coressponding field is only 15 characters long in the destination staging table.

    Source and Customer_ID form composite PK in the staging table.

    Right now i am working with small sample files however in the past when i have used such packages with data conversion + derived transforms with dataset ranging between 250,000 - 500,000 records, the whole process has slowed down signifcantly and have found stored procedures called via Execute sql TAsk a more viable solution.

    However in this case i would like to do as much as possible of the upload process via BIDS.

    Anyone advise from those with experience creating such upload packages will be greatly appreciated.

    My main concern is trying to marry up DT_STR(50) datatypes from the Flatfile with BIT,INTEGER,DT_STR(10-20smaller than 50 therfore get truncation error messages) using derived +datconversion tranform i believe will slow things down substantially? + time for the whole process to complete.

  • If the format of your input files is static and consistent, you could change the data types in your flat file connection rather than explicitly convert them later in the package. This could help eliminate some of the data conversion transformations.

    For the truncated fields, are your ETL needs such that truncation is OK? You can modify the settings of the flat file source to silently ignore truncation errors, but you will lose the ability to capture that data (for auditing or any other purpose) further down in your package.

    hth,

    Tim

  • How can i change datatypes in my flatfile Connection manager?

    IN regrads to truncation by using casting etc in derived columns i have resolved my issues.

    However still have casting errors when i try and do the following:-

    From the flat file i have a field called Prod_ID (DT_STR) Lenght= 50 and 2other fields P_Item,P_New_Items hvaing the same Datatype+lenght.

    Now in my OLE DB destination datatype for these 3 fields are INTEGER/[DT_14], and Boolean[DT_BOOL] for the other two fields.

    When i try to cast these in the derived tranform i get casting errors:

    1. [DT_I4] substring([prod_ID],1,8)

    alternatively have also trioed this:-

    [DT_I4] [prod_ID]

    theoratically my understanding is that this should work but NO luck!

    2. Other boolean conversion is a bit strange the first one works but second doesn't?

    [DT_BOOL] [ColumnName] - works for one bit field but not the other???

  • Boolean casting problem is because in the flatfile the field being converted is poulated with ""?

    i have tried this and it passes through ok via deirved transform but when it tries to insert into ole destination i get an error.

    [DT_bool] isnull[Fieldname] || [Fieldname] = " " ? "0" : [Fieldname]

  • You can set the data types in the flat file connection manager under the Advanced tab of that control. Clicking each column will allow you to view or change the data type (and length, in the case of character data) for that column. However, simply changing the data type will not clean up the data; the component will still still error if you try to do an invalid cast, such as casting alphanumeric text to a number, or a space character to a Boolean.

    If you have to do a lot of data transformation, I would definitely recommend that you use a DT_STR or DT_WSTR value in the file connection and do your cleanup with another transformation such as the Derived Column. It can be more expensive in terms of resources, but you have more options and better error handling control.

    hth,

    Tim

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply