June 17, 2010 at 9:53 am
I have an Excel source file that has some rows with a line feed carriage return in the Address1 columns of the file. This causes SSIS to think it is the end of the row, and import the data incorrectly. Is there a way to remove the CRLF characters from the file using SSIS? Right now, I'm toying with converting the Excel file to XML to see if SSIS can remove them from an XML file (I was going to replace them with a space), but even if that is possible, I'll still have to figure out how to automatically convert the Excel file to XML. Unfortunately, they can only save my source file as .pdf or .xls. Any help/ideas is GREATLY appreciated.
Thanks,
Celita
June 17, 2010 at 9:58 am
You should be able to tell the Excel Data Source to treat the Address1 field as a text stream rather than as a string. Right-click on it and pick "Advanced Editor...", go to the last tab and pick the Address1 field, there you can override the data type (probably DT_STR) that SSIS automatically selected. You probably want DT_TEXT, that should allow it to read the whole field regardless of line feeds and other control characters. Then feed it into a Derived Column and you can use the string manipulation functions to REPLACE char(13) with char(32) in that field and any others.
June 17, 2010 at 10:03 am
Wow! That was fast. I should have posted yesterday! I see where you mean and will try that. Thank you VERY VERY much!
June 17, 2010 at 10:14 am
😀 I just hope it works, I did that from memory.
June 17, 2010 at 10:36 am
First I want to say that I'm not really familiar with the Advanced Editor, which you can tell since I completely forgot it even existed. Anyway, I'm now getting an error. I changed the Excel Source Output External Columns for the first Address1 field from Unicode String to Unicode Text Stream. This also changed the field type for the same column in the Excel Source Output Columns and Excel Source Error Output Output Columns to be Unicode Text Stream. Now my data source has a red X on it with the error below. What did I do wrong? The data isn't going anywhere yet.
Error6Validation error. Convert Data file: rqi excel source [678]: The output column "PatAddr1" (36688) on the error output has properties that do not match the properties of its corresponding data source column.RQI_Export_2.dtsx00
June 17, 2010 at 11:04 am
There are three places to change the data type. You did output and error, you also need to do input. Same place, different part of the tree.
June 17, 2010 at 11:25 am
That is what it SOUNDS like, but I reset and did it again. *I* changed only the Excel Source output External Columns, which is what is coming into Excel, and changed that one column PatAddr1 to Unicode text stream. I then checked the Excel Source output Output Columns, and the data type for PatAddr1 had been automatically changed to Unicode text stream when I changed it in the other location. I then checked the Excel Source Error Output, and it says Unicode string, and won't let me change it to Unicode text stream.
I've attached screen prints of what the columns look like in each section. The error message I get on the Error Output is also there.
June 17, 2010 at 12:44 pm
I've never tried this but supposedly you can add "IMEX=1; MAXROWSTOSCAN=0" to your Excel connection string to force SSIS to reconsider what the data type should be. I don't know that it will work in your case.
See also Microsoft Knowledge Base 189897 for more hints.
June 17, 2010 at 12:47 pm
I did try the IMEX=1, but that didn't help at all. I"ll try that knowledge base article. Thanks!
June 20, 2010 at 4:33 pm
Celita (6/17/2010)
I have an Excel source file that has some rows with a line feed carriage return in the Address1 columns of the file. This causes SSIS to think it is the end of the row, and import the data incorrectly. Is there a way to remove the CRLF characters from the file using SSIS? Right now, I'm toying with converting the Excel file to XML to see if SSIS can remove them from an XML file (I was going to replace them with a space), but even if that is possible, I'll still have to figure out how to automatically convert the Excel file to XML. Unfortunately, they can only save my source file as .pdf or .xls. Any help/ideas is GREATLY appreciated.Thanks,
Celita
If you can use third-party solutions, check the commercial CozyRoc Excel Source Plus component. The component doesn't try to be smart about the column type and you will be able to read the cell value without any additional tricks.
June 25, 2010 at 5:50 am
I have a working solution. I imported the Excel data into a table with ntext fields for the address1 fields. I also had a second new column for each address1 field. I then used a dreaded cursor to check for char(10), and if present, put everything before it in the new address1 fields and everything after it in address2 fields. Then, because I found that some people hit enter twice, I checked address2 fields for char(10) and if found, deleted them. Since this was in a physical table, I was able to use a distinct query of the fields without line feeds as my source for the rest of the package. It isn't very elegant, but it works.
June 28, 2010 at 5:06 am
Hi,
Here is a Macro for excel to carriage return or linefeed characters.
Sub KillCR()
Dim ws As Worksheet
For Each ws In Worksheets
With ws.Cells
.Replace vbCrLf, " "
.Replace vbCr, " "
.Replace vbLf, " "
End With
Next
MsgBox "Done"
End Sub
The link to the code is as below.
http://allfaq.org/forums/t/111654.aspx
Please try to see of this helps.
Thanks,
lakshmi
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply