July 21, 2005 at 8:03 am
Hi All,
I am having problems with importing a text file. They have sent the file in the wrong format. Now it is my part to clean it before loading into table.
The file format which i have now is as follows
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Expr1000 | Expr1001 | Expr1002 | Expr1003 | Expr1004 | Expr1005 | Expr1006 | Expr1007 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | | Hamburger | Total Produits | Total Produits Repas | Petits Produits | Petits Produits | Hamburger |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 10 | | King Fish | Total Produits | Total Produits Repas | Total Gr.Prod.+ | Grands Produits | King Fish |
I need to remove all those --- before the data. I have those --- before and after each data row. Also I need to remove the spaces in the data columns. It should be like this
1| |Hamburger|Total Produits|Total Produits Repas|Petits Produits|Petits Produits|Hamburger
Please let me know if there is a way to clean this file.
Thanks,
Sridhar.
July 22, 2005 at 2:49 am
My own method of dealing with this would be as follows :-
Open the file in Textpad (do a google search if you dont have it).
Use Search>Replace on the hyphens and do not enter a replacement value. The hyphens will be gone, but you will have a blank line between each row of data.
Use Search>Replace again on " | " and replace with "|" this will close the spaces on your field separators. Repeat for "| " and " |" if required.
Open the file in Excel, and select the delimited option from the text import wizard. Your data should appear as one long string in column A. Highlight all the data except the header, and select Data>Sort>ColumnA>Ascending.
This will move all your blank lines to the end of the file, which you can either delete in excel, or just copy your data to a new workbook.
Note, this will not resolve your problem if:-
1. You have to regularly sanitise the data - in which case i would go back to the data supplier and resolve the issue there.
2. You have more than 60k rows in your file, as Excel will only support 65k.
July 22, 2005 at 2:53 am
Hi Sridar,
You could try cleaing the file with a simple vbs script like this (change the name and path for the files obviously)
If this is not suitable for you then you could probably include this is a transform in DTS, but it may be awkward if you are ignoring whole line.
copy and paste it into notepad and save it as a .vbs file
Dim objFSO , objInFile, objOutFile
dim s
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInFile = objFSO.OpenTextFile("d:\ToClean.txt" , 1)
Set objOutFile = objFSO.OpenTextFile("d:\Cleanfile.txt" , 2)
while not objInFile.AtEndOfStream
s=replace((objInFile.ReadLine),"-","")
if s<>"" then
s=replace(s,"| ","|")
s=replace(s," |","|")
objOutfile.write replace(s," | ","|") & chr(13) & chr(10)
end if
wend
objInFile.Close
Set objFSO = Nothing
Set objInFile = Nothing
Set objOutFile = Nothing
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply