Replace text string containing 2 character hex values with there CHAR values

  • I'm doing some performance tuning for a ETL process. In a particular string fields we have a a varied amount of things, but any "special characters" are encoded as %XX where XX is a 2 character hex value. For example a name would appear as Smith%2c John. I want to replace these character with there actual CHAR value so the above example would be Smith, John. Sounds fairly straight forward. We already have a method of doing this using a UDF which convert the hex to a int and then take the CHAR of that value. However for performance reason I want to avoid using a UDF. Note these strings can contain multiple encoded characters and I do not have a complete list so I cannot just use nested REPLACE function to give me what I want.

    Ideally I want to do this inline, without use of UDF's or loops. This needs to work on SQL Server 2005 and 2008.

    As anyone got and cunning technique to do this, as I've been trying to think out something but nothing gives me everything I'm after

  • It is possible to do something like this with recursive cte's.

    This is what I managed to come up with, and I think it will do what you need it to do:

    declare @String_Table table

    (

    String_Value varchar(100)

    )

    insert into @String_Table

    select'Lorem ipsum dolor sit amet%2c consectetur adipiscing elit%2c'

    union all

    select'1056 %2d Ut pharetra%2c nulla ut venenatis'

    -- the first cte replaces the first special character it finds and returns the result

    -- it will then call itself recursively if more special characters were found

    ;with Recursive_CTE

    as

    (

    -- anchor member

    selectst.String_Value as Original_String

    --,charindex('%',st.String_Value,1) as Special_Character_Index

    --,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2) as Hex_Value

    --,convert(varbinary,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2),2) as Hex_Value_Binary_Equivalent

    --,char(convert(varbinary,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2),2)) as Converted_Hex_Value

    ,replace

    (

    st.String_Value

    ,'%' + substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2)

    ,char(convert(varbinary,substring(st.String_Value,charindex('%',st.String_Value,1) + 1,2),2))

    ) as Intermediate_Result

    ,0 as CTE_Iteration

    from@String_Table st

    wherecharindex('%',st.String_Value,1) > 0

    union all

    -- recursive member

    selectcte.Original_String

    --,charindex('%',cte.Intermediate_Result,1) as Special_Character_Index

    --,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2) as Hex_Value

    --,convert(varbinary,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2),2) as Hex_Value_Binary_Equivalent

    --,char(convert(varbinary,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2),2)) as Converted_Hex_Value

    ,replace

    (

    cte.Intermediate_Result

    ,'%' + substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2)

    ,char(convert(varbinary,substring(cte.Intermediate_Result,charindex('%',cte.Intermediate_Result,1) + 1,2),2))

    ) as Intermediate_Result

    ,CTE_Iteration + 1 as CTE_Iteration

    fromRecursive_CTE cte

    wherecharindex('%',cte.Intermediate_Result,1) > 0

    )

    ,

    -- the second cte ranks the output of the first cte

    -- to make it easier to find the last entry for each string value

    Ranked_Output_CTE

    as

    (

    selectcte.Original_String

    ,cte.Intermediate_Result

    ,row_number() over

    (

    partition by cte.Original_String

    order by cte.CTE_Iteration desc

    ) as Row_Counter

    fromRecursive_CTE cte

    )

    -- finally, we return the result

    selectcte.Original_String

    ,cte.Intermediate_Result as String_Result

    fromRanked_Output_CTE cte

    wherecte.Row_Counter = 1

    Just a word of caution though...to prevent infinite loops, the number of recursion levels are by default limited to 100.

    This means that the query will fail if more than 100 distinct "special characters" are found in this case. It is possible to overwrite the default with the MAXRECURSION query hint, but be aware that it could still potentially fail if you have large string values.

    Hope this helps.

    Martin.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply