Removing or Replacing Data

Question

Removing or Replacing Data

AndrewSQLDBA

SSC-Insane

Points: 21827
More actions
July 20, 2011 at 10:59 am

#244542

Hello Everyone
I am working on some Health Care Data, I know, you don't have to rub it in.
Some of the data that is being pumped in from HL7(who ever came up with this standard, needs to have their @$$ removed). Some of the data has what looks like XML tags all thru it
example: <OBX.5><OBX.5.1>Blah Blah Data</OBX.5.1></OBX.5>
And of course with the ending tags as well.
I need to remove all of the tags. Anyone has an idea of how I can remove all the tags from within a column? So that the data comes out to be only the
Blah Blah Data
There may be many of the tags, with all different numbers inside.
Thank You in advance
Andrew SQLDBA

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 1

Have you tried the REPLACE() function?

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Daryl AZ Hall of Fame Points: 3087 More actions · Answer 2

The long and nasty way is to loop through the characters in the data.

Use loops and find first and last '<', '>'.

if < & > are not in the actual data it is fairly simple to remove first and last for each <obx.5> which works for <tag> and </mytag>

This could take awhile depending upon how many columns. I would prefer to clean the chaff out of the target table and then put a stripping step in the data import.

good luck

AndrewSQLDBA SSC-Insane Points: 21827 More actions · Answer 3

Yes, I have thought of that. But what I am not getting is all of the number within the tags.

So my data looks like this if there are more than 2 digits within the tags originally.

<.5.1>

How can I ensure that everything within the tags, and including the tags get replaced?

Andrew SQLDBA

Lowell SSC Guru Points: 323518 More actions · Answer 4

i have this saved in my snippets for stripping out HTML tags. note this assumes that well formed html is in palce...so % lt ; should be there for < symbols that exist in the text data portion.

declare @htmlText varchar(max)

--===== Replace all HTML tags with nothing

WHILE CHARINDEX('<',@HTMLText) > 0

SELECT @HTMLText = STUFF(@HTMLText,

CHARINDEX('<',@HTMLText),

CHARINDEX('>',@HTMLText)-CHARINDEX('<',@HTMLText)+1,

'')

SELECT @HTMLText

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Daryl AZ Hall of Fame Points: 3087 More actions · Answer 5

While looping through the characters in the row.

Find the first open bracket, do not include any data until you have went past the closing bracket.

Note: if the input is consistent it can be even simpler.

<Tag1><tag1.1>yada</tag1.1><Tag1> Consistent meaning the Yada is the only data you are looking for. Loop until you get to a character after '>' that is not '<' use that character string until you get to the next '<' done.

AndrewSQLDBA SSC-Insane Points: 21827 More actions · Answer 6

There are hundreds of tags throughout the column, and the tags are not in the same location in each row.

Any advice?

Thanks

Andrew SQLDBA

Daryl AZ Hall of Fame Points: 3087 More actions · Answer 7

If you had a couple of different sanitized examples it would give a better picture. thx

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 8

Why would a combination of REPLACE(), LIKE() and PATINDEX() not work here?

EDIT: Like in Lowell's example? It's set based, no looping. Looping bad if you don't need to do it.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

AndrewSQLDBA SSC-Insane Points: 21827 More actions · Answer 9

I forgot to mention this, but the column is a text data type, and replace is not working with Text data type.

Andrew SQLDBA

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 10

AndrewSQLDBA (7/20/2011)
I forgot to mention this, but the column is a text data type, and replace is not working with Text data type.

Can you update it to VARCHAR(MAX) without breaking anything?

Or add a VARCHAR(MAX) column to the table, copy over the text data into the new column, then use the REPLACE()?

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Daryl AZ Hall of Fame Points: 3087 More actions · Answer 11

Daryl AZ

Hall of Fame

Points: 3087

July 20, 2011 at 11:31 am

#1355899

Cast your text type as a varchar(max)

Lowell SSC Guru Points: 323518 More actions · Answer 12

ok here's a tested working example...that i reworked quickly because you changed the datatype tyo TEXT.

i'm assuming this'll be a one time update, so it might be slow on a MillionBillion rows of data:

--DROP TABLE MySampleData

CREATE TABLE MySampleData (ID INT,HTMLText TEXT)

INSERT INTO MySampleData(ID,HTMLText)

SELECT 1,'NoHTML Inside' UNION ALL

SELECT 2, '<Tag1><tag1.1>yada</tag1.1><Tag1>' UNION ALL

SELECT 3 ,'<OBX.5><OBX.5.1>Blah Blah Data</OBX.5.1></OBX.5>Other Stuff outside of the tags'

DECLARE @TagsExist INT

SET @TagsExist = 1

WHILE @TagsExist > 0

BEGIN

IF NOT EXISTS(SELECT 1 FROM MySampleData WHERE CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText)) > 0)

BEGIN

SET @TagsExist = 0

END

UPDATE MySampleData

SET HTMLText = STUFF(CONVERT(VARCHAR(MAX),HTMLText),

CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText)),

CHARINDEX('>',CONVERT(VARCHAR(MAX),HTMLText)) - CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText))+ 1,

'')

WHERE CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText)) > 0

END --while loop

SELECT * FROM MySampleData

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Daryl AZ Hall of Fame Points: 3087 More actions · Answer 13

Nice snippet Lowell.

That looks like a nice piece of code to place in the tool chest!

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 14

What happens if the text outside the tags containt < or > ?

I never fully tackled this problem, but I know it's far from simple.

Moreover I think that RBARing this might be a good option to try. Right now you're hitting and updating the same row a ton of times. I'd preffer using a itvf or even a simple function, loop in it and then update the base row only once (read / write once max).