July 16, 2012 at 4:41 pm
Hello everyone
I have a table with an ntext field that contains many tags of different types. I want to preserve the text and remove the tags when I retrieve the data with a select. I don’t want to update the table, just select the text without the tags.
For example:
'<div> We would like the comments box to be available </div> '
Would become:
‘We would like the comments box to be available’
The only tool I have found searching this site and others is a UDF and of course I want to avoid being chastised for going RBAR so I thought I’d come to you folks
I have attached some sample data. If you can point me in a direction, it would be much appreciated.
create table #damnTags
(
NtextField ntext
)
insert #damnTags
select '<div> We would like the comments box to be available on the Bed Check at all times,
even if a client is not marked Present. Thanks, Tim</div> '
union
select '<div> <b>Edited by David Orez on Monday, April 30, 2012 at 9:28 AM</b></div> <div> Payroll Utility</div>'
select * from #damnTags
July 16, 2012 at 5:01 pm
Do you want to remove the tags in the data, or remove them once you've selected things?
July 17, 2012 at 6:18 am
Only in the query. I don't want to update the table.
July 17, 2012 at 6:23 am
In this case, as far as I can see, a UDF is your most straightforward solution. I quite like this one:
create function dbo.StripHTML( @text varchar(max) ) returns varchar(max) as
begin
declare @textXML xml
declare @result varchar(max)
set @textXML = @text;
with doc(contents) as
(
select chunks.chunk.query('.') from @textXML.nodes('/') as chunks(chunk)
)
select @result = contents.value('.', 'varchar(max)') from doc
return @result
end
as it at least uses some fancy XML (original here.)
If your HTML is not well formed (in which case the above function would fail), your best performing solution is likely to be a CLR/Regex-type thing.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 17, 2012 at 7:18 am
Thanks much, Phil for both the code and the link.
I was leaning toward CLR but this is a third party database and makes me a bit nervous. I can't see what harm it would do but making this kind of an addition/change is the type of thing that can cause finger pointing should strange issues arise (No wonder you're having trouble with our app! Who knows what else you done to the database!!!).
I believe I will request the blessings of the powers that be and head down that road if allowed. Maybe I can place the CLR function in one database and call it from another.
July 17, 2012 at 9:22 am
There should be no issue at all if you go down the path you suggest in your last sentence. You are not updating data, merely reading it.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 17, 2012 at 1:30 pm
Hi Phil
I tested the UDF and it works great. But I am having a tough time wrapping my brain around or uncovering what is going on here. You have a CTE whose output is being stripped of the tags and returned by the function but I'm not able to see deeper than that.
Regarding " select chunks.chunk.query('.') from @textXML.nodes('/') as chunks(chunk)", it seems to be turning the string into XML but I can't see where chunk.chunk.QUERY come from (looks like a function). And is "nodes" an attribute of an xml string that can be used as a function?
Regarding " contents.value('.', 'varchar(max)') from doc", it looks like contents.value refers to the value returned by the CTE but how do the supplied parameters strip the xml string of its tags.
I'm hoping you or someone else has the time to explain this to me.
July 17, 2012 at 1:46 pm
Yeah, you'll be needing a SQL/XML ninja for that explanation:w00t: I only found that function today while hunting for a solution to your problem and I haven't yet got to the bottom of how it works either. I need to do some dissection.
I do have a slight head start on you though. There are a whole load of XML-specific methods available to you in SQL Server 2005+ (nodes and query both fall into this category). See here for some info and welcome to a new world: methods in T-SQL, who'd have thought?
Have fun - I'll post back if I do that analysis.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 23, 2012 at 5:05 pm
Oky Doky.
Just got back from vacation and wanted to get to this ASAP. I am not optimizing this query, just getting the desired results for now.
First, no UDF needed. Second, there is a specific best practice I will ignore here; I will call it out in the end.
If your NtextField was stored in XML format you could do this:
IF OBJECT_ID('tempdb..#dangTags') IS NOT NULL
DROP TABLE #dangTags
CREATE TABLE #dangTags
(
NtextField XML
)
insert #dangTags
select '<div> We would like the comments box to be available on the Bed Check at all times,
even if a client is not marked Present. Thanks, Tim</div> '
union
select '<div> <b>Edited by David Orez on Monday, April 30, 2012 at 9:28 AM</b></div> <div> Payroll Utility</div>'
select CAST((NtextField.query('//text()')) AS NVARCHAR(max)) AS NtextField
from #dangTags
Since it isn't you would do this
IF OBJECT_ID('tempdb..#dangTags') IS NOT NULL
DROP TABLE #dangTags
CREATE TABLE #dangTags
(
NtextField ntext
)
insert #dangTags
select '<div> We would like the comments box to be available on the Bed Check at all times,
even if a client is not marked Present. Thanks, Tim</div> '
union
select '<div> <b>Edited by David Orez on Monday, April 30, 2012 at 9:28 AM</b></div> <div> Payroll Utility</div>'
select CAST((CAST(NtextField AS XML).query('//text()')) AS NVARCHAR(max)) AS NtextField
from #dangTags
The only difference is...
This:
CAST((NtextField.query('//text()')) AS NVARCHAR(max)) AS NtextField
is now this:
CAST((CAST(NtextField AS XML).query('//text()')) AS NVARCHAR(max)) AS NtextField
NTextField.query was changed to
CAST(NtextField AS XML).query
Bad habit:
When doing XPath (within T-SQL or elsewhere) never use the "//" unless you absolutely must.
-- Itzik Ben-Gan 2001
July 24, 2012 at 1:49 am
Aha: XMLSQLNinja - just what the doctor ordered! 🙂
That's a cool solution - can you explain how it works?
And if you've got time, perhaps you could also have a go at explaining the 'dog-food' (chunks.chunk) solution?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 25, 2012 at 1:44 pm
XMLSQLNinja, thank for your time and alternative approach. I'm afraid that the garbage in my data is not well formed and this method stumbles as a result as does the previous one. Regardless, my thanks for your time.
July 25, 2012 at 2:27 pm
jshahan (7/25/2012)
XMLSQLNinja, thank for your time and alternative approach. I'm afraid that the garbage in my data is not well formed and this method stumbles as a result as does the previous one. Regardless, my thanks for your time.
That's a bummer.
In the future consider using the XML datatype. If you can figure out what records are bad and get them fixed then you change that NText column to XML (e.g. export the contents of the target table somewhere, recreate the table using the XML datatype instead of NText, then import the data back in.)
Once you have an XML column no XML data can be added unless it is well-formed. Note the code below:
DECLARE @x table
(
ID int IDENTITY(1,1),
xmldata XML
)
INSERT INTO @x VALUES ('<xxx>blah blah</xxx>'),('<xxx><yyy>blah blah</yyy></xxx>')
SELECT * FROM @x
-- All good up to this point. Now try bad XML:
INSERT INTO @x VALUES ('<xyz>blah blah</abc>')
-- This will produce an error
-- Itzik Ben-Gan 2001
July 25, 2012 at 2:37 pm
It is a third party package overwhich I had no control but even if I did, I didn't know that the xml datatype was capable of that type of constraint. Thanks for the education.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply