Inconsistent CSV Files

  • I need to use SSIS to automatically import a group of CSV files. The problem is that not all of the files are consistent. Please let me show you an example.

    CSV FILE #1's Columns

    FISCAL_WEEK_NUM

    MANUFACTURER

    MANUFACTURER_PART_NO

    SKU_NUM

    SKU_DESC

    SHIP_QTY

    TOTAL_FPC_DOLLARS

    SHIP_TO_CITY

    SHIP_TO_STATE

    SHIP_TO_ZIPCODE

    SEGMENT_DESC

    SUB_CHANNEL_DESC

    OPTION_CLASS

    SHIP_DATE_ACTUAL_DATE

    CSV FILE #2's Columns

    FISCAL WEEK

    MANUFACTURER

    MANUFACTURER PART NUMBER

    SKU NUMBER

    SKU DESCRIPTION

    SHIPPING QUANTITY

    SHIPPING DOLLARS?

    SHIP TO CITY

    SHIP TO STATE

    SHIP TO ZIP CODE

    SEGMENT DESCRIPTION

    CHANNEL DESCRIPTION

    OPTION CLASS

    ORDER NUMBER

    CUSTOMER NUMBER

    BILL TO COMPANY

    LINK NUMBER

    LOCAL CHANNEL

    INVOICE DATE

    SUB CLASS

    SHIP TO COMPANY

    As you can see, there are some common columns, but not all. So I modified the destination SQL Server 2008 table to contain columns from both source files. However, I noticed that the Flat File Connection Manager assumes the columns will be "set in stone." Is there a way around this? I don't think I'm allowed to go back to the organization that provides the source files and tell them to fix the problem.

  • If you can distinguish between the files by their name or something obvious. Create parallel datastreams that join in a union.

    File1Source >> DerivedColumn to add missing columns >> Union

    File2Source >> Union

    Then move everything through all other needed transforms and to your destination.

  • This is a good idea, except there will be many files in one format and many files in another format. I might be able to figure out what format based on the file name (newer files tend to be in the second format) but I can't hard code the file name into the connection.

  • I wasn't suggesting that you hard code the file name, merely that their might be file naming conventions that you can exploit. If all of one type of file have a naming convention that is different from the naming convention of the other type of file you should be able to define one loop to look for the first file format, and the other to look for the second file format.

  • Here is what I ended up doing. I had the Foreach Loop put the file name in a variable. Then I had a script do the basically the follwoing:

    Dts.Variables("intFormat").Value = 0

    If lintFileDate < 200934 Then

    Dts.Variables("intFormat").Value = 1

    ElseIf lintFileDate >= 200934 And lintFileDate < 201101 Then

    Dts.Variables("intFormat").Value = 2

    ElseIf lintFileDate >= 201101 And lintFileDate < 201210 Then

    Dts.Variables("intFormat").Value = 3

    ElseIf lintFileDate >= 201210 Then

    Dts.Variables("intFormat").Value = 4

    End If

    And then I had precidence constraints for each of the four intFormat variable values. This way, I can branch out and deal with each of the formats based on the file names.

  • Oh, and you were absolutely right, there were naming conventions for the file names. Thank you so much!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply