August 31, 2011 at 3:15 am
I'm doing some performance tuning for a ETL process. In a particular string fields we have a a varied amount of things, but any "special characters" are encoded as %XX where XX is a 2 character hex value. For example a name would appear as Smith%2c John. I want to replace these character with there actual CHAR value so the above example would be Smith, John. Sounds fairly straight forward. We already have a method of doing this using a UDF which convert the hex to a int and then take the CHAR of that value. However for performance reason I want to avoid using a UDF. Note these strings can contain multiple encoded characters and I do not have a complete list so I cannot just use nested REPLACE function to give me what I want.
Ideally I want to do this inline, without use of UDF's or loops. This needs to work on SQL Server 2005 and 2008.
As anyone got and cunning technique to do this, as I've been trying to think out something but nothing gives me everything I'm after
August 31, 2011 at 3:08 pm
It is possible to do something like this with recursive cte's.
This is what I managed to come up with, and I think it will do what you need it to do:
declare @String_Table table
(
String_Value varchar(100)
)
insert into @String_Table
select'Lorem ipsum dolor sit amet%2c consectetur adipiscing elit%2c'
union all
select'1056 %2d Ut pharetra%2c nulla ut venenatis'
-- the first cte replaces the first special character it finds and returns the result
-- it will then call itself recursively if more special characters were found
;with Recursive_CTE
as
(
-- anchor member
selectst.String_Value as Original_String
--,charindex('%',st.String_Value,1) as Special_Character_Index
--,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2) as Hex_Value
--,convert(varbinary,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2),2) as Hex_Value_Binary_Equivalent
--,char(convert(varbinary,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2),2)) as Converted_Hex_Value
,replace
(
st.String_Value
,'%' + substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2)
,char(convert(varbinary,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2),2))
) as Intermediate_Result
,0 as CTE_Iteration
from@String_Table st
wherecharindex('%',st.String_Value,1) > 0
union all
-- recursive member
selectcte.Original_String
--,charindex('%',cte.Intermediate_Result,1) as Special_Character_Index
--,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2) as Hex_Value
--,convert(varbinary,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2),2) as Hex_Value_Binary_Equivalent
--,char(convert(varbinary,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2),2)) as Converted_Hex_Value
,replace
(
cte.Intermediate_Result
,'%' + substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2)
,char(convert(varbinary,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2),2))
) as Intermediate_Result
,CTE_Iteration + 1 as CTE_Iteration
fromRecursive_CTE cte
wherecharindex('%',cte.Intermediate_Result,1) > 0
)
,
-- the second cte ranks the output of the first cte
-- to make it easier to find the last entry for each string value
Ranked_Output_CTE
as
(
selectcte.Original_String
,cte.Intermediate_Result
,row_number() over
(
partition by cte.Original_String
order by cte.CTE_Iteration desc
) as Row_Counter
fromRecursive_CTE cte
)
-- finally, we return the result
selectcte.Original_String
,cte.Intermediate_Result as String_Result
fromRanked_Output_CTE cte
wherecte.Row_Counter = 1
Just a word of caution though...to prevent infinite loops, the number of recursion levels are by default limited to 100.
This means that the query will fail if more than 100 distinct "special characters" are found in this case. It is possible to overwrite the default with the MAXRECURSION query hint, but be aware that it could still potentially fail if you have large string values.
Hope this helps.
Martin.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply