May 9, 2009 at 1:59 am
I have been using CSV files for a one-off data conversion - it is the only mechanism I have available. The good thing is that I have control of both the export and the import.
One of the fields I am extracting has a TEXT datatype and it contains free-text notes. There are all sorts of problematic characters in there (double quotes, single quotes, commas, tabs - you name it). These characters need to be removed or replaced as they are extracted, or they will cause a problem on upload.
But there does not appear to be any way to run a REPLACE command on a text field - either in T-SQL or in an SSIS Script Component.
In the end, I CAST the field to Varchar(4000) on extract and was able to use replace on that. Some data was truncated, but it was an acceptably small amount. But I would love to know whether anyone else has dealt with this problem without losing data? I would like to be able to do this in the pipeline somehow and not as a bulk replace on the CSV file once it has been created.
Cheers for any input
Phil
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 11, 2009 at 7:42 am
Phil Parkin (5/9/2009)
I have been using CSV files for a one-off data conversion - it is the only mechanism I have available. The good thing is that I have control of both the export and the import.One of the fields I am extracting has a TEXT datatype and it contains free-text notes. There are all sorts of problematic characters in there (double quotes, single quotes, commas, tabs - you name it). These characters need to be removed or replaced as they are extracted, or they will cause a problem on upload.
But there does not appear to be any way to run a REPLACE command on a text field - either in T-SQL or in an SSIS Script Component.
In the end, I CAST the field to Varchar(4000) on extract and was able to use replace on that. Some data was truncated, but it was an acceptably small amount. But I would love to know whether anyone else has dealt with this problem without losing data? I would like to be able to do this in the pipeline somehow and not as a bulk replace on the CSV file once it has been created.
Cheers for any input
Phil
Phil,
What about using derived column with expression using REPLACE function in it?
May 11, 2009 at 8:14 am
Thanks for the idea - tried it and:
The function REPLACE does not support the data type "DT_TEXT" for parameter number 1. The type of the parameter could not be explicitly cast into a compatible type for the function. To perform this function, the operand needs to be explicitly cast with a cast operator.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 11, 2009 at 9:01 am
Phil Parkin (5/11/2009)
Thanks for the idea - tried it and:The function REPLACE does not support the data type "DT_TEXT" for parameter number 1. The type of the parameter could not be explicitly cast into a compatible type for the function. To perform this function, the operand needs to be explicitly cast with a cast operator.
What is the issue when you try to use script component?
May 11, 2009 at 9:08 am
I overcome this issue by converting the text column back to string and then doing the replace in SSIS Script component.
Here is that script:
Imports System
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim rowString As String
rowString = Me.ConvertBlobToString(Row.RawDataLine)
Row.CleanLine = Replace(rowString, "$", String.Empty)
End Sub
Private Function ConvertBlobToString(ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn) As String
If blobColumn Is Nothing Then
Return (String.Empty)
Else
Return (System.Text.Encoding.Default.GetString(blobColumn.GetBlobData(0, Convert.ToInt32(blobColumn.Length))))
End If
End Function
End Class
--Ramesh
May 11, 2009 at 9:12 am
CozyRoc (5/11/2009)
What is the issue when you try to use script component?
OK, I did this in a test environment to check. I have an OLEDB Source which just does a SELECT TOP 10 LongNotes ... where LongNotes is a text field.
Add a Script Component and configure the input column, LongNotes, to be ReadWrite.
Then, in the script, I added the following line:
Row.longnotes = Replace(Row.longnotes, "a", "b")
Which does not validate. The error is:
Value of type Microsoft.SqlServer,Dts.Pipeline.BlobColumn cannot be converted to string.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 11, 2009 at 9:16 am
Ramesh (5/11/2009)
I overcome this issue by converting the text column back to string and then doing the replace in SSIS Script component.Here is that script:
Imports System
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim rowString As String
rowString = Me.ConvertBlobToString(Row.RawDataLine)
Row.CleanLine = Replace(rowString, "$", String.Empty)
End Sub
Private Function ConvertBlobToString(ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn) As String
If blobColumn Is Nothing Then
Return (String.Empty)
Else
Return (System.Text.Encoding.Default.GetString(blobColumn.GetBlobData(0, Convert.ToInt32(blobColumn.Length))))
End If
End Function
End Class
Wow, some cool stuff there! Is there a limit on the total number of characters that ConvertBlobToString() can process and return?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 11, 2009 at 9:29 am
Phil Parkin (5/11/2009)
Wow, some cool stuff there! Is there a limit on the total number of characters that ConvertBlobToString() can process and return?
I guess the limit of String data type is limited to the available memory or 2GB (i.e. 2^31 Unicode characters).
See this MSDN link for more info
http://msdn.microsoft.com/en-us/library/thwcx436(VS.80).aspx
--Ramesh
May 11, 2009 at 9:30 am
Phil Parkin (5/11/2009)
Ramesh (5/11/2009)
I overcome this issue by converting the text column back to string and then doing the replace in SSIS Script component.Here is that script:
Imports System
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim rowString As String
rowString = Me.ConvertBlobToString(Row.RawDataLine)
Row.CleanLine = Replace(rowString, "$", String.Empty)
End Sub
Private Function ConvertBlobToString(ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn) As String
If blobColumn Is Nothing Then
Return (String.Empty)
Else
Return (System.Text.Encoding.Default.GetString(blobColumn.GetBlobData(0, Convert.ToInt32(blobColumn.Length))))
End If
End Function
End Class
Wow, some cool stuff there! Is there a limit on the total number of characters that ConvertBlobToString() can process and return?
The solution Ramesh has sent will work fine for up to 2GB of data, based on the fact it accepts as length parameter Int32. You can check here for another TSQL-based solution.
May 11, 2009 at 9:39 am
CozyRoc & Ramesh - great work, thanks for the help.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 11, 2009 at 7:31 pm
Hi guys - I bet you thought that this was all over? :hehe:
I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?
I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.
Surely I need such a function to avoid the 4,000 char limit - or am I missing something?
Thanks again.
Phil
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 11, 2009 at 10:49 pm
Phil Parkin (5/11/2009)
Hi guys - I bet you thought that this was all over? :hehe:I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?
I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.
Surely I need such a function to avoid the 4,000 char limit - or am I missing something?
Thanks again.
Phil
Phil,
I'm guessing the target column is either DT_TEXT (ANSI Text) or DT_NTEXT (Unicode). Check below for ConvertStringToBlob function:
Private Sub ConvertStringToBlob(ByVal input As String, ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn)
If input Is Nothing Then
Call blobColumn.SetNull()
Else
Call blobColumn.AddBlobData(System.Text.Encoding.Default.GetBytes(input))
End If
End Sub
May 11, 2009 at 11:27 pm
CozyRoc (5/11/2009)
Phil Parkin (5/11/2009)
Hi guys - I bet you thought that this was all over? :hehe:I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?
I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.
Surely I need such a function to avoid the 4,000 char limit - or am I missing something?
Thanks again.
Phil
Phil,
I'm guessing the target column is either DT_TEXT (ANSI Text) or DT_NTEXT (Unicode). Check below for ConvertStringToBlob function:
Private Sub ConvertStringToBlob(ByVal input As String, ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn)
If input Is Nothing Then
Call blobColumn.SetNull()
Else
Call blobColumn.AddBlobData(System.Text.Encoding.Default.GetBytes(input))
End If
End Sub
CozyRoc, your sub nailed it. Ace work 😎
I also tried adding a new field to the Output Buffer, type DT_TEXT, then the following code:
Output0Buffer.AddRow()
Output0Buffer.FixNotes = ConvertBlobToString(Row.longnotes)
where FixNotes is the name of the field I added to the Output Buffer and LongNotes is the Text field. This does not validate - I get a message about the field FixNotes being read only.
But the problem is now resolved and, in future, I am sure I will definitely use this.
Phil
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 12, 2009 at 4:08 am
Phil Parkin (5/11/2009)
Hi guys - I bet you thought that this was all over? :hehe:I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?
I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.
Surely I need such a function to avoid the 4,000 char limit - or am I missing something?
Thanks again.
Phil
In this case, "CleanLine" is indeed a DT_WSTR with 4000 max. characters, but I assumed:-) that you understood that "CleanLine" needs to be reversed to TEXT data before sending it to output buffer.
--Ramesh
May 12, 2009 at 4:16 am
Ramesh (5/12/2009)
In this case, "CleanLine" is indeed a DT_WSTR with 4000 max. characters, but I assumed:-) that you understood that "CleanLine" needs to be reversed to TEXT data before sending it to output buffer.
Hey, I'm not as good at this stuff as you think! 😀 It's not as if this 'reversing' is as trivial as you have made it sound either!
Got there in the end though, thanks to you guys. I have a really neat solution now.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply