April 4, 2008 at 10:24 am
Hello, and thank you for any help you can give me. I inherited an e-commerce website, and I've been going through the database, which is MSSQL 2000. I'm normally OK with databases, but I found that the Description column of the products is full of HTML, for formatting. Normally not a problem to just reformat the text at design time, but for over 2000 products, a little difficult.
Is there a way that I can automate the removal of all of the html elemens, without removing the text?
Thank you.
April 4, 2008 at 10:42 am
An easy way is to use Reg Expressions, i wrote this code to strip out some tagging in a sql2005 database. You will have to change the PATINDEX('%[^a-z^0-9^_-]%', Reg Exp to match the ones you need for HTML codes. Plenty of Reg Exp tutorials can be found online.
CREATE FUNCTION [dbo].[fn_RemoveTag]
(
@String nvarchar(100)
)
RETURNS nvarchar(100)
as
BEGIN
DECLARE @i int
SET @STRING = RTRIM(@STRING)
SET @i = (SELECT PATINDEX('%[^a-z^0-9^_-]%', @String))
WHILE (@i > 0)
BEGIN
SET @String = (SELECT REPLACE(@String, SUBSTRING(@String, @i, 1), ''))
SET @i = (SELECT PATINDEX('%[^a-z^0-9^_-]%', @String))
END
SET @STRING = REPLACE(@STRING,'--','-')
SET @STRING = LOWER(@STRING)
RETURN (@STRING)
END
April 4, 2008 at 11:09 am
Thank you!!
So, at this point
SET @i = (SELECT PATINDEX('%[^a-z^0-9^_-]%', @String))
I would change it to
SET @i = (SELECT PATINDEX('%htmltags', @String))
?
April 4, 2008 at 5:38 pm
that is the right place to change it, but you need to supply a Reg Exp something similar to
SET @i = (SELECT PATINDEX((>(?: (? [^<]*))) ,@String))
should work, i am unable to test it at the moment, you may need to read up on Reg Expressions
to work out the exact pattern.
Failing this you could always use the REPLACE function in sql, look in BOL for more info.
April 4, 2008 at 8:29 pm
Um.... SQL Server 2000 doesn't have the full compliment of RegEx operators, so I'm pretty sure that won't work.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2008 at 11:17 am
Walking through the string and stripping out the tags would be pretty straightforward too. This also handles a few common special characters used in HTML, like amp; and quot;.
I couldn't figure out how to add the ampersand within the code tag below so if you try running this make sure to add it to each of the HTML codes before creating the function.
-- Usage
SELECT dbo.ufn_RemoveTags('<FONT color=#FF0000>Red ballons</FONT>')
--
CREATE FUNCTION ufn_RemoveTags(@string VARCHAR(4000))
RETURNS VARCHAR(4000)
AS
BEGIN
DECLARE @newstring VARCHAR(4000)
SET @newstring = ''
DECLARE @i INT
SET @i = 1
WHILE @i <= LEN(@string)
BEGIN
DECLARE @character AS CHAR(1)
SET @character = substring(@string,@i,1)
IF @character = '<'
SET @i = charindex('>',@string,@i)
ELSE
SET @newstring = @newstring + @character
SET @i = @i + 1
END
--!! WARNING !!
-- Add the ampersand to the beginning of each entity prior to creating the function.
SET @newstring = REPLACE(@newstring,'gt;','>')
SET @newstring = REPLACE(@newstring,'lt;','<')
SET @newstring = REPLACE(@newstring,'nbsp',' ')
SET @newstring = REPLACE(@newstring,'amp;','&')
SET @newstring = REPLACE(@newstring,'quot;','"')
RETURN @newstring
END
April 14, 2008 at 8:50 am
Thank you so much for all of your replies. Sorry I'm not a SQL literate person so I can contribute a bit more. I'm just gonna have to listen to you guys. It's been very enlightening reading through the posts.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply