Creating a function to replace symbols with HTML names

  • Hi guys,

    This is an interesting problem that I'm trying to solve that we normally do in C#, but for my specific project here for this coming weekend, I need to do directly in SQL.

    I have a bunch of strings (names, addresses, cities, etc) that I need to replace any nonstandard alphanumeric characters with their HTML-friendly names, i.e. replace é with é

    The only issue here is there are around 100 possible characters that I need to replace. I have a lookup table that I use in C#, but I can't think of any way to implement the same thing in SQL without having 98 nested replaces or a cursor. This will be processing around 10 fields per record on around 400k records, so I can't have completely awful performance. Slow is OK...if selecting 30,000 row batches goes from 1 second to 20 seconds, that's fine for my purposes...1 second to 30 minutes is not.

    Here is some sample code with the symbol table and data. Again, I'm trying to find a way to do this without huge nested replace statements or a cursor/etc because there are around 100 possible characters to replace:

    DECLARE @Symbols TABLE (

    Symbol NCHAR(1),

    HTML NVARCHAR(20)

    )

    INSERT INTO @Symbols

    SELECT '¡','¡' UNION ALL

    SELECT '¢','¢' UNION ALL

    SELECT '£','£' UNION ALL

    SELECT '¤','¤' UNION ALL

    SELECT '¥','¥' UNION ALL

    SELECT '¦','¦' UNION ALL

    SELECT '§','§' UNION ALL

    SELECT '¨','¨'

    DECLARE @Text NVARCHAR(50)

    SET @Text = 'Replace ¢ and £ and § with HTML-friendly strings'

    SELECT @Text

    Any help would be appreciated. Thanks in advance 😀

  • This is actually much simpler than it seems, because SELECT is implemented behind the scenes as a loop. Jeff Moden refers to this as a pseudo-cursor. You can do this in one SELECT statement.

    DECLARE @Symbols TABLE (

    Symbol NCHAR(1),

    HTML NVARCHAR(20)

    )

    INSERT INTO @Symbols

    SELECT '¡','¡' UNION ALL

    SELECT '¢','¢' UNION ALL

    SELECT '£','£' UNION ALL

    SELECT '¤','¤' UNION ALL

    SELECT '¥','¥' UNION ALL

    SELECT '¦','¦' UNION ALL

    SELECT '§','§' UNION ALL

    SELECT '¨','¨'

    DECLARE @Text NVARCHAR(50)

    SET @Text = 'Replace ¢ and £ and § with HTML-friendly strings'

    SELECT @Text = Replace(@Text, Symbol, HTML)

    FROM @Symbols

    WHERE PatIndex('%' + Symbol + '%', @Text) > 0

    SELECT @Text

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks Drew - and Jeff if he reads this.

    Only issue I'm running into now is that it's replacing characters that shouldn't be, i.e.

    DECLARE @1 NVARCHAR(200),

    @2 NVARCHAR(200)

    SELECT @1 = 'athfawthfawf', @2 = 'þ'

    SELECT PATINDEX('%' + @2 + '%',@1)

    Any occurrence of 'th' will be considered the same as 'þ' and be replaced, along with quite a few other symbols.

    I've tried different collations, even binary, but no luck in stopping it from doing this. Any ideas here?

  • I was able to get it to work with the following:

    SELECT @Text = Replace(@Text COLLATE LATIN1_GENERAL_BIN, Symbol, HTML)

    FROM @Symbols

    WHERE PatIndex('%' + Symbol + '%', @Text) > 0

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • here's anotehr version i just got working as a function;

    it's using a Tally table to swap char by char, based on the ascii() code of the character in question: i added a list of html entities i had on hand, so my list is pretty large, and those i did not have a default of i used the & # 255 ; format if i didn't have an explicit name.

    SELECT dbo.HTMLizeHighAscii( 'Replace ¢ and £ and § with HTML-friendly strings')

    i had a ton of problems getting the function to paste cleanly into the forum, because it contains html codes, so here's a link to the text file of it instead.

    HTMLizeHighAscii.txt

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks again. Drew, your suggestion worked...I was collating when I set the variable inside the function but not inside the replace statement...once I moved it there, everything seems honky dory now.

    Lowell I'll give that function a quick look too to see how it does it. I have a feeling they'll have more unusual requests like this going forward and might need a more robust solution.

    You guys are awesome, as usual. Now to go play around with pseudo-cursors and see what other stuff I can rewrite..

  • Lowell (10/18/2011)


    i had a ton of problems getting the function to paste cleanly into the forum, because it contains html codes, so here's a link to the text file of it instead.

    I had to entitize the ampersand twice. So in order to get ¥ I had to put in ¥

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply