cleaning the file to import using DTS

  • Hi All,

    I am having problems with importing a text file. They have sent the file in the wrong format. Now it is my part to clean it before loading into table.

    The file format which i have now is as follows

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    | Expr1000 | Expr1001 | Expr1002 | Expr1003 | Expr1004 | Expr1005 | Expr1006 | Expr1007 |

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    | 1 | | Hamburger | Total Produits | Total Produits Repas | Petits Produits | Petits Produits | Hamburger |

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    | 10 | | King Fish | Total Produits | Total Produits Repas | Total Gr.Prod.+ | Grands Produits | King Fish |

    I need to remove all those --- before the data. I have those --- before and after each data row. Also I need to remove the spaces in the data columns. It should be like this

    1| |Hamburger|Total Produits|Total Produits Repas|Petits Produits|Petits Produits|Hamburger

    Please let me know if there is a way to clean this file.

    Thanks,

    Sridhar.

  • My own method of dealing with this would be as follows :-

    Open the file in Textpad (do a google search if you dont have it).

    Use Search>Replace on the hyphens and do not enter a replacement value.  The hyphens will be gone, but you will have a blank line between each row of data.

    Use Search>Replace again on " | " and replace with "|" this will close the spaces on your field separators.  Repeat for "| " and " |" if required.

    Open the file in Excel, and select the delimited option from the text import wizard.  Your data should appear as one long string in column A.  Highlight all the data except the header, and select Data>Sort>ColumnA>Ascending.

    This will move all your blank lines to the end of the file, which you can either delete in excel, or just copy your data to a new workbook.

    Note, this will not resolve your problem if:-

    1. You have to regularly sanitise the data - in which case i would go back to the data supplier and resolve the issue there.

    2. You have more than 60k rows in your file, as Excel will only support 65k.

  • Hi Sridar,

    You could try cleaing the file with a simple vbs script like this (change the name and path for the files obviously)

    If this is not suitable for you then you could probably include this is a transform in DTS, but it may be awkward if you are ignoring whole line.

    copy and paste it into notepad and save it as a .vbs file

    Dim objFSO , objInFile, objOutFile

    dim s

    Set objFSO = CreateObject("Scripting.FileSystemObject")

    Set objInFile = objFSO.OpenTextFile("d:\ToClean.txt" , 1)

    Set objOutFile = objFSO.OpenTextFile("d:\Cleanfile.txt" , 2)

    while not objInFile.AtEndOfStream

     s=replace((objInFile.ReadLine),"-","")

     if s<>"" then

      s=replace(s,"| ","|")

      s=replace(s," |","|")

      objOutfile.write replace(s," | ","|") & chr(13) & chr(10)

     end if

    wend

    objInFile.Close

    Set objFSO = Nothing

    Set objInFile = Nothing

    Set objOutFile = Nothing

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply