December 18, 2006 at 3:55 pm
i just realised that task like this has been discussed already:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=320575&p=2
and i posted some code for SQL Server 2005 there.
after the CLR dbo.Regex_Replace is compiled in .NET, all u need is one function call to do the cleaning task
select mystring, dbo.Regex_Replace(mystring, '<[^<>]*>', '') as myCleanString from dbo.X
December 18, 2006 at 4:00 pm
Sergiy,
*normal simple mind people* is the last notion that comes to my mind when I read your posts. I enjoy them a lot thou
stop by at the regex site where I'm hanging around:
http://regexadvice.com/forums/68/ShowForum.aspx
it could be helpful
December 19, 2006 at 5:42 pm
I don't have SQL 2005, so I cannot test your fuction against my example.
What I'm trying to say this task is not just about replacing text between "" including these symbols with empty string.
It's much more complex.
You must find openint tag, make sure it's opening tag, find nearest corresponding closing tag and only after that you can remove it.
Otherwise your function will work like this site - removing all words surrounded by "<" and ">" even when they have nothing to do with HTML tags.
_____________
Code for TallyGenerator
December 19, 2006 at 9:39 pm
u r absolutely right Sergiy: that's exactly the procedure that must be followed. For that purpose, more complex regular expressions patterns exist. Like, for example:
match the opening tag <u>, then text in between, then the closing tag </u>; then remove the matched tags.
moreover , there is a way to specify a generic tag in the pattern like above.
December 19, 2006 at 9:52 pm
as an example it's possible to render the original text [from say fld "mystring" in tbl dbo.X]
<a href="my_target_texthttp://www.sergiy.org">my_target_text</a><some_other_not_targeted_tagged_text>
to
my_target_text<some_other_not_targeted_tagged_text>
by using this CLR-based Regex function call:
select mystring, dbo.Regex_Replace(mystring, '(<a[^<>]*>([^<>]*)(</a>', '$2') as myCleanString from dbo.X
in the call, only Group_2 ($2) of the matched Regex is left in the orig text.
December 19, 2006 at 10:11 pm
sorry, the thing got messed up in my prev post: [hopefully works now]
select mystring, dbo.Regex_Replace(mystring, '(<a[^<>]*> ) ([^<>]*) ( </a> )', '$2') as myCleanString from dbo.X
December 21, 2006 at 9:22 am
Remi,
thought u might want to see the rest of the thread. Thanks.
Sergei
December 21, 2006 at 9:25 am
I've been following it. I just had nothing else to add. Regex is the only [simple] solution for this problem and it is far from my area of expertise so I have nothing else to say .
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply