June 6, 2006 at 1:12 pm
Can someone please help me? I've been looking at this data and my eyes and head now hurt like hell. Here is my dilema:
I have a free text field, that I need to parse out to various possible usable values.
These values I need to extract are DrugName, Strength, Form.
Here is a sample of the data I am looking at (Table: DrugNameTable):
DrugName
CIPROFLOXACIN 500 MG TAB
METOPROLOL TARTRATE 25 MG TAB
VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML
VITAMIN E 30 IU/GM OINT TUBE
What I need to try to accomplish with respect to record #1 would look like the following:
DrugName = CIPROFLOXACIN
Strength = 500MG
Form = TAB
What I need to try to accomplish with respect to record #2 would look like the following:
DrugName = METOPROLOL TARTRATE
Strength = 25MG
Form = TAB
What I need to try to accomplish with respect to record #3 would look like the following:
DrugName = VINORELBINE TARTRATE
Strength = 35MG
Form = INJ 50ML
What I need to try to accomplish with respect to record #4 would look like the following:
DrugName = VITAMIN E
Strength = 30IU/GM
Form = OINT TUBE
Can someone please help me with this. I have tried parsing it by space, which doesnt work obviously. I replaced the ' MG' with 'MG' which simplifies part of the issue. I have just spent a day looking for any possible solution to this other than someone else doing manual work in Excel.
Thanks in advance,
Steve
June 7, 2006 at 12:30 am
Try this:
CREATE TABLE DrugNameTable (DrugName varchar(100))
GO
INSERT INTO DrugNameTable (DrugName)
SELECT 'CIPROFLOXACIN 500 MG TAB'
UNION SELECT 'METOPROLOL TARTRATE 25 MG TAB'
UNION SELECT 'VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML'
UNION SELECT 'VITAMIN E 30 IU/GM OINT TUBE'
GO
SELECT CASE WHEN DrugName LIKE '%MG%' THEN
RTRIM(LEFT(DrugName,CHARINDEX('MG',DrugName)-5))
ELSE RTRIM(LEFT(DrugName,CHARINDEX(' ',DrugName,9)))
END AS DrugName
, CASE WHEN DrugName LIKE '%MG%' THEN
LTRIM(RTRIM(SUBSTRING(DrugName,CHARINDEX('MG',DrugName)-4,7)))
ELSE LTRIM(RTRIM(SUBSTRING(DrugName,CHARINDEX(' ',DrugName,9),9)))
END AS Strength
, CASE WHEN DrugName LIKE '%TAB' THEN
LTRIM(RIGHT(RTRIM(DrugName),3))
ELSE LTRIM(RTRIM(RIGHT(RTRIM(DrugName),9)))
END AS Form
FROM DrugNameTable
GO
Andy
June 7, 2006 at 1:36 am
Hi Steve,
I was preparing a parser for you while Andy posted the better solution.
Anyhow, if you can't predict the literals each time, here is the parser I wrote: (assuming no drug name contains numeric characters and UOMs (Units of Measure) have no spaces in the name and that there is always a space after UOM name.) You may have to trim the "Form" field a bit.
Afterward you might select distinct on each field to identify those which broke parser assumptions. Hopefully not too many.
CREATE TABLE #DrugTable ([TextField] varchar(500),[DrugName] varchar(255),[Strength] varchar(50),[Form] varchar(255))
GO
INSERT INTO #DrugTable (TextField,DrugName,Strength,Form)
--------------------------------------------------------------------------------LIVE
--SELECT Drugname,'','','' FROM DrugNameTable
--------------------------------------------------------------------------------TEST
SELECT 'CIPROFLOXACIN 500 MG TAB','','',''
UNION SELECT 'METOPROLOL TARTRATE 25 MG TAB','','',''
UNION SELECT 'VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML','','',''
UNION SELECT 'VITAMIN E 30 IU/GM OINT TUBE','','',''
------------------------------------------------------------------------------------
GO
declare @MYFIELD varchar(4000),
@MYCHAR integer,
@MYSUBSTR varchar(1000),
@MYCONTROL integer
DECLARE @MYFREETEXTFIELD varchar(4000)
DECLARE mytable CURSOR FOR
select TextField from #DrugTable
OPEN mytable
FETCH NEXT FROM mytable INTO @MYFREETEXTFIELD
WHILE (@@FETCH_STATUS = 0)
BEGIN
set @MYFIELD = @MYFREETEXTFIELD
set @MYCHAR = 0
set @MYCONTROL = 0
set @MYSUBSTR = ''
while @MYCHAR < len(@MYFIELD)
begin
set @MYCHAR = @MYCHAR + 1
--SET CONTROL PHASES
--set control to rest of field
if @MYCONTROL = 2 and substring(@MYFIELD,@MYCHAR,1) = ' '
begin
update #DrugTable set Strength = @MYSUBSTR where TextField = @MYFIELD
set @MYSUBSTR = ''
set @MYCONTROL = 3
end
--set control to quantity unit
if @MYCONTROL = 1 and substring(@MYFIELD,@MYCHAR,1) = ' '
begin
set @MYCONTROL = 2
end
--set control to quantity
if @MYCONTROL = 0 and isnumeric(substring(@MYFIELD,@MYCHAR,1)) = 1
begin
update #DrugTable set DrugName = @MYSUBSTR where TextField = @MYFIELD
set @MYSUBSTR = ''
set @MYCONTROL = 1
end
--DO OUTPUT BUILDS
--build drug name
if @MYCONTROL = 0
begin
set @MYSUBSTR = @MYSUBSTR + substring(@MYFIELD,@MYCHAR,1)
end
--build quantity and unit
if @MYCONTROL = 1 or @MYCONTROL = 2
begin
set @MYSUBSTR = @MYSUBSTR + substring(@MYFIELD,@MYCHAR,1)
end
--build rest of field
if @MYCONTROL = 3
begin
set @MYSUBSTR = @MYSUBSTR + substring(@MYFIELD,@MYCHAR,1)
end
end
update #DrugTable set Form = right(@MYSUBSTR,len(@MYSUBSTR) - 1) where TextField = @MYFIELD
FETCH NEXT FROM mytable INTO @MYFREETEXTFIELD
END
CLOSE mytable
DEALLOCATE mytable
GO
--------------------------------------------------------------------------------LIVE
--insert or append to actual table from #DrugTable as required
--------------------------------------------------------------------------------TEST
select * from #DrugTable
------------------------------------------------------------------------------------
GO
DROP TABLE #DrugTable
GO
June 7, 2006 at 5:38 am
Here's a set-based version of Ian's suggestion. It's not pretty, but I don't think anything you do will be pretty.
As with Ian's suggestion, I haven't removed 'IN SODIUM CHLORIDE 0.9% '. You'll have to figure out your own rules for that, but the techniques you've seen in the replies should give you an idea of how to do the parsing.
--data
declare @DrugNameTable TABLE (DrugName varchar(100))
INSERT INTO @DrugNameTable (DrugName)
SELECT 'CIPROFLOXACIN 500 MG TAB'
UNION SELECT 'METOPROLOL TARTRATE 25 MG TAB'
UNION SELECT 'VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML'
UNION SELECT 'VITAMIN E 30 IU/GM OINT TUBE'
--calculation
select s, DrugName,
left(FirstNumberOnwards, FirstSpaceAfterFirstNumber + SecondSpaceAfterFirstNumber - 1) as Strength,
substring(FirstNumberOnwards, FirstSpaceAfterFirstNumber + SecondSpaceAfterFirstNumber + 1, 100) as Remainder
from (
select s, DrugName, FirstNumberOnwards, FirstSpaceAfterFirstNumber,
patindex('% %', substring(FirstNumberOnwards, FirstSpaceAfterFirstNumber + 1, 100)) as SecondSpaceAfterFirstNumber
from (
select s, DrugName, FirstNumberOnwards,
patindex('% %', FirstNumberOnwards) as FirstSpaceAfterFirstNumber
from (
select s, left(s, FirstNumber-2) as DrugName, substring(s, FirstNumber, 100) as FirstNumberOnwards
from (
select DrugName as s, patindex('%[0-9]%', DrugName) as FirstNumber
from @DrugNameTable) a) b) c) d
/*results
DrugName Strength Remainder
--------------------- --------- ---------------------------------
CIPROFLOXACIN 500 MG TAB
METOPROLOL TARTRATE 25 MG TAB
VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML
VITAMIN E 30 IU/GM OINT TUBE
*/
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 7, 2006 at 5:48 am
Thanks all for helping out here. I'm going to give them all a try today and see which one will yeild the best results re manual work for the Data Entry folks.
Thanks again for your time on this.
Steve
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply