Stripping HTML for a Table Column

  • Hello, and thank you for any help you can give me. I inherited an e-commerce website, and I've been going through the database, which is MSSQL 2000. I'm normally OK with databases, but I found that the Description column of the products is full of HTML, for formatting. Normally not a problem to just reformat the text at design time, but for over 2000 products, a little difficult.

    Is there a way that I can automate the removal of all of the html elemens, without removing the text?

    Thank you.

  • An easy way is to use Reg Expressions, i wrote this code to strip out some tagging in a sql2005 database. You will have to change the PATINDEX('%[^a-z^0-9^_-]%', Reg Exp to match the ones you need for HTML codes. Plenty of Reg Exp tutorials can be found online.

    CREATE FUNCTION [dbo].[fn_RemoveTag]

    (

    @String nvarchar(100)

    )

    RETURNS nvarchar(100)

    as

    BEGIN

    DECLARE @i int

    SET @STRING = RTRIM(@STRING)

    SET @i = (SELECT PATINDEX('%[^a-z^0-9^_-]%', @String))

    WHILE (@i > 0)

    BEGIN

    SET @String = (SELECT REPLACE(@String, SUBSTRING(@String, @i, 1), ''))

    SET @i = (SELECT PATINDEX('%[^a-z^0-9^_-]%', @String))

    END

    SET @STRING = REPLACE(@STRING,'--','-')

    SET @STRING = LOWER(@STRING)

    RETURN (@STRING)

    END

  • Thank you!!

    So, at this point

    SET @i = (SELECT PATINDEX('%[^a-z^0-9^_-]%', @String))

    I would change it to

    SET @i = (SELECT PATINDEX('%htmltags', @String))

    ?

  • that is the right place to change it, but you need to supply a Reg Exp something similar to

    SET @i = (SELECT PATINDEX((>(?: (? [^<]*))) ,@String))

    should work, i am unable to test it at the moment, you may need to read up on Reg Expressions

    to work out the exact pattern.

    Failing this you could always use the REPLACE function in sql, look in BOL for more info.

  • Um.... SQL Server 2000 doesn't have the full compliment of RegEx operators, so I'm pretty sure that won't work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Walking through the string and stripping out the tags would be pretty straightforward too. This also handles a few common special characters used in HTML, like amp; and quot;.

    I couldn't figure out how to add the ampersand within the code tag below so if you try running this make sure to add it to each of the HTML codes before creating the function.

    -- Usage

    SELECT dbo.ufn_RemoveTags('<FONT color=#FF0000>Red ballons</FONT&gt')

    --

    CREATE FUNCTION ufn_RemoveTags(@string VARCHAR(4000))

    RETURNS VARCHAR(4000)

    AS

    BEGIN

    DECLARE @newstring VARCHAR(4000)

    SET @newstring = ''

    DECLARE @i INT

    SET @i = 1

    WHILE @i <= LEN(@string)

    BEGIN

    DECLARE @character AS CHAR(1)

    SET @character = substring(@string,@i,1)

    IF @character = '<'

    SET @i = charindex('>',@string,@i)

    ELSE

    SET @newstring = @newstring + @character

    SET @i = @i + 1

    END

    --!! WARNING !!

    -- Add the ampersand to the beginning of each entity prior to creating the function.

    SET @newstring = REPLACE(@newstring,'gt;','>')

    SET @newstring = REPLACE(@newstring,'lt;','<')

    SET @newstring = REPLACE(@newstring,'nbsp',' ')

    SET @newstring = REPLACE(@newstring,'amp;','&')

    SET @newstring = REPLACE(@newstring,'quot;','"')

    RETURN @newstring

    END

  • Thank you so much for all of your replies. Sorry I'm not a SQL literate person so I can contribute a bit more. I'm just gonna have to listen to you guys. It's been very enlightening reading through the posts.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply