November 6, 2006 at 4:04 am
Hi
I have a feild in which the data is being saved along with the HTML tags.Now I have to extract the text from this field.
The data will be like this
<STRONG><FONT color=#ff0000>Test update</FONT></STRONG>
Now i need to extract "Test Update" from this.
November 6, 2006 at 4:22 am
What makes you think that T-SQL is the best tool to read HTML?
_____________
Code for TallyGenerator
November 6, 2006 at 4:31 am
Well the developers have saved the data like this.Now they just need to extract the text out of this.
November 6, 2006 at 4:40 am
It's a problem of those developers.
Let them read the text as they saved it and do with it whatever they want in their application.
Database is DATA-BASE.
It's for saving data, not rubbish.
HTML text should not be allowed in database at all.
_____________
Code for TallyGenerator
November 6, 2006 at 7:46 am
Arun
Sergiy is quite right, of course. Now, to answer your question... the following should work. Somebody might be able to think of something more concise. It should work no matter how many HTML tags you have, although I've only tried it on the example you gave.
John
declare
@htmlstring varchar(256)
set
@htmlstring = '<STRONG><FONT color=#ff0000>Test update</FONT></STRONG>'
set
@htmlstring = left(@htmlstring, charindex('</', @htmlstring) - 1)
set
@htmlstring = reverse(@htmlstring)
set
@htmlstring = left(@htmlstring, charindex('>', @htmlstring) - 1)
set
@htmlstring = reverse(@htmlstring)
select
@htmlstring
November 6, 2006 at 7:56 am
What happens if there are more than one line of text to extract in a single row?!?!
November 6, 2006 at 8:00 am
Ok, this ain't set based... but I don't have time to rewrite it at the moment. It'll at least give you another option if needed.
CREATE FUNCTION dbo.fnStripHtmlTags (@Text AS VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @Return AS VARCHAR(8000)
DECLARE @StartPos AS SMALLINT
DECLARE @Loops AS SMALLINT
SET @Loops = LEN(@Text) - LEN (REPLACE(@Text, '<', ''))
WHILE @Loops > 0
BEGIN
SET @StartPos = CHARINDEX ('<', @Text, 1)
SET @Text = STUFF(@Text, @StartPos, CHARINDEX ('>', @Text, @StartPos + 1) - @StartPos + 1, '')
SET @Loops = @Loops - 1
END
SET @Return = @Text
RETURN @Return
END
But as everyone else said and will say. save the data correctly in the database, or let 'em take the trash out.
Garbage IN, Garbage OUT!
November 6, 2006 at 8:39 am
i'd just delete everything that is in between the tags <.....>, ending up with the targeted text. One function call would do it.
November 6, 2006 at 8:45 am
Isn't it what my function do?
November 6, 2006 at 9:09 am
sorry, wasn't attentive enough..yes your function does the same thing..
November 6, 2006 at 9:15 am
... but if I had to rebuild me dunction it would use John Mitchell's approach... would be much more efficiant that way.
November 6, 2006 at 9:49 am
being the official RegEx Junkie
of the forum, i'll post regular expression's solution for
this cleaning task
(for SQL 2005):
using pre
-compiled CLR scalar-valued function dbo.Regex_Replace, which takes 2 params: 1) string to clean
2
) regular expression pattern
and outputs the cleaned string
Create
table X
(
myid int identity,
mystring
varchar(1000) not null)
GO
Insert
into dbo.X (mystring)
Select
'<STRONG><FONT color=#ff0000>Test update</FONT></STRONG><STRONG><FONT color=#ff0000>Test update</FONT></STRONG>'
--do cleaning here
select
mystring, dean.dbo.Regex_Replace(mystring, '<[^<>]*>', '') from dbo.X
--rendering the input:
<
STRONG><FONT color=#ff0000>Test update</FONT></STRONG><STRONG><FONT color=#ff0000>Test update</FONT></STRONG>
--to the cleaned form:
Test updateTest
update
C# code
for the CLR function: [compiled in NET 2.0]
using System
;
using System
.Data;
using System
.Data.SqlClient;
using System
.Data.SqlTypes;
using Microsoft
.SqlServer.Server;
using System
.Text.RegularExpressions;
public
partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString Regex_Replace(String s, String match_pattern, String replace_pattern)
{
// Put your code here
Regex r
= new Regex(match_pattern, RegexOptions.Multiline|RegexOptions.IgnoreCase);
return r.Replace(s, replace_pattern).ToString();
}
};
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply