How to split string and separate the data with numbers and text

  • I am having a set of medication where I am trying to split the data and get the required output, I have the following medications

    ARANESP 100MCG PFS SOLD BY THE
    ARANESP 200MCG PFS SOLD BY THE
    ARANESP 40MCG PFS SOLD BY THE
    ARANESP 60MCG PFS SOLD BY THE
    CALCITRIOL 0.25mcg CAPSULE (EA
    CALCITRIOL CAP .5MCG 100/BT"SO
    CEFAZOLIN SODIUM 1GM EACH MMS
    CEFTAZIDIME INJ 1GM SDV "EACHE
    CINACALCET HCL 30MG 30/BT SLAT
    CINACALCET HCL 60MG 30/BT 100
    CINACALCET HCL 90MG 30/BT
    LEVOFLOXACIN TAB 500MG "SOLD A
    Zemplar Cap 1MCG 30/BT EA

    I am trying to separate the medication, strength and Strength unit  so that the output should be

    ARANESP 100 MCG
    ARANESP 200 MCG
    ARANESP 40 MCG
    ARANESP 60 MCG
    CALCITRIOL 0.25 mcg
    CALCITRIOL CAP .5 MCG
    CEFAZOLIN SODIUM 1 GM
    CEFTAZIDIME INJ 1 GM
    CINACALCET HCL 30 MG
    CINACALCET HCL 60 MG
    CINACALCET HCL 90 MG
    LEVOFLOXACIN TAB 500 MG
    Zemplar Cap 1 MCG
  • I have the sample fiddle here

    http://sqlfiddle.com/#!18/d2781/2

    The last row data I need some modification

     

    MCG MCG MCG MCG mcg MCG GM GM MG MG MG U/ML U/ML CREAM ML uni MG % % MG/ML ML MCG

    • This reply was modified 3 years, 5 months ago by  sqldevlearn.
    • This reply was modified 3 years, 5 months ago by  sqldevlearn.
  • Here is one way that does it:

    declare @t table (name varchar(50))
    insert into @t (name) values
    ('ARANESP 100MCG PFS SOLD BY THE'),
    ('ARANESP 200MCG PFS SOLD BY THE'),
    ('ARANESP 40MCG PFS SOLD BY THE'),
    ('ARANESP 60MCG PFS SOLD BY THE'),
    ('CALCITRIOL 0.25mcg CAPSULE (EA'),
    ('CALCITRIOL CAP .5MCG 100/BT"SO'),
    ('CEFAZOLIN SODIUM 1GM EACH MMS'),
    ('CEFTAZIDIME INJ 1GM SDV "EACHE'),
    ('CINACALCET HCL 30MG 30/BT SLAT'),
    ('CINACALCET HCL 60MG 30/BT 100'),
    ('CINACALCET HCL 90MG 30/BT'),
    ('LEVOFLOXACIN TAB 500MG "SOLD A'),
    ('Zemplar Cap 1MCG 30/BT EA');


    with mycte as (
    select name, charindex (' ', name, patindex('%[0-9,.]%',name)) as LastChar
    from @t
    ),
    ShortStrings as (
    select substring(name,1, LastChar) as ShortSring
    from mycte),
    NewSpaceLocation as (
    select ShortSring, patindex('%[0-9]%',reverse(ShortSring))-1 as LocationLastDigit
    from ShortStrings)
    select left(ShortSring,len(ShortSring)+1-LocationLastDigit) + ' ' + right(ShortSring ,LocationLastDigit)
    from NewSpaceLocation

    Adi

  • Adi Cohn wrote:

    Here is one way that does it:

    declare @t table (name varchar(50))
    insert into @t (name) values
    ('ARANESP 100MCG PFS SOLD BY THE'),
    ('ARANESP 200MCG PFS SOLD BY THE'),
    ('ARANESP 40MCG PFS SOLD BY THE'),
    ('ARANESP 60MCG PFS SOLD BY THE'),
    ('CALCITRIOL 0.25mcg CAPSULE (EA'),
    ('CALCITRIOL CAP .5MCG 100/BT"SO'),
    ('CEFAZOLIN SODIUM 1GM EACH MMS'),
    ('CEFTAZIDIME INJ 1GM SDV "EACHE'),
    ('CINACALCET HCL 30MG 30/BT SLAT'),
    ('CINACALCET HCL 60MG 30/BT 100'),
    ('CINACALCET HCL 90MG 30/BT'),
    ('LEVOFLOXACIN TAB 500MG "SOLD A'),
    ('Zemplar Cap 1MCG 30/BT EA');


    with mycte as (
    select name, charindex (' ', name, patindex('%[0-9,.]%',name)) as LastChar
    from @t
    ),
    ShortStrings as (
    select substring(name,1, LastChar) as ShortSring
    from mycte),
    NewSpaceLocation as (
    select ShortSring, patindex('%[0-9]%',reverse(ShortSring))-1 as LocationLastDigit
    from ShortStrings)
    select left(ShortSring,len(ShortSring)+1-LocationLastDigit) + ' ' + right(ShortSring ,LocationLastDigit)
    from NewSpaceLocation

    Adi

     

    Thanks I had an extra medication which I didn't listed ('Heparin Sodium Injection USP 1') it is failing for it.

  • Also ('EPOGEN 20 000 U/ML MDV 1ML"EA"') is giving as EPOGEN 20  where it should be EPOGEN 20000 U/ML

    declare @t table (name varchar(50))
    insert into @t (name) values
    ('ARANESP 100MCG PFS SOLD BY THE'),
    ('ARANESP 200MCG PFS SOLD BY THE'),
    ('ARANESP 40MCG PFS SOLD BY THE'),
    ('ARANESP 60MCG PFS SOLD BY THE'),
    ('CALCITRIOL 0.25mcg CAPSULE (EA'),
    ('CALCITRIOL CAP .5MCG 100/BT"SO'),
    ('CEFAZOLIN SODIUM 1GM EACH MMS'),
    ('CEFTAZIDIME INJ 1GM SDV "EACHE'),
    ('CINACALCET HCL 30MG 30/BT SLAT'),
    ('CINACALCET HCL 60MG 30/BT 100'),('CINACALCET HCL 90MG 30/BT'),
    ('EPOGEN 10 000/ML MDV 2ML "EA"'),('EPOGEN 20 000 U/ML MDV 1ML"EA"'),
    ('EPOGEN 3000U/ML 1ML "EA" SOLD'),('EPOGEN 4000U/ML 1ML "EA" SOLD'),('GENTAMICIN 0.1% CREAM 15GM 1/E'),
    ('GENTAMICIN OPT SOL 5ML EACH'),('Heparin Sod Inj USP 30 000 uni'),('Heparin Sodium Injection USP 1'),
    ('LEVOFLOXACIN TAB 500MG "SOLD A'),('LIDOCAINE/PRILOCAINE 2.5% 30GM'),('MUPIROCIN CREAM 2% 30GM'),
    ('VENOFER 20MG/ML 5ML (100MG) EA'),('WATER STERILE FTV 10ML "EA" 25'),('Zemplar Cap 1MCG 30/BT EA');


    with mycte as (
    select name, charindex (' ', name, patindex('%[0-9,.]%',name)) as LastChar
    from @t
    ),
    ShortStrings as (
    select substring(name,1, LastChar) as ShortSring
    from mycte),
    NewSpaceLocation as (
    select ShortSring, patindex('%[0-9]%',reverse(ShortSring))-1 as LocationLastDigit
    from ShortStrings)
    select left(ShortSring,len(ShortSring)+1-LocationLastDigit) + ' ' + right(ShortSring ,LocationLastDigit)
    from NewSpaceLocation
  • sqldevlearn wrote:

    I am having a set of medication where I am trying to split the data and get the required output, I have the following medications

    ARANESP 100MCG PFS SOLD BY THE
    ARANESP 200MCG PFS SOLD BY THE
    ARANESP 40MCG PFS SOLD BY THE
    ARANESP 60MCG PFS SOLD BY THE
    CALCITRIOL 0.25mcg CAPSULE (EA
    CALCITRIOL CAP .5MCG 100/BT"SO
    CEFAZOLIN SODIUM 1GM EACH MMS
    CEFTAZIDIME INJ 1GM SDV "EACHE
    CINACALCET HCL 30MG 30/BT SLAT
    CINACALCET HCL 60MG 30/BT 100
    CINACALCET HCL 90MG 30/BT
    LEVOFLOXACIN TAB 500MG "SOLD A
    Zemplar Cap 1MCG 30/BT EA

    I am trying to separate the medication, strength and Strength unit  so that the output should be

    ARANESP 100 MCG
    ARANESP 200 MCG
    ARANESP 40 MCG
    ARANESP 60 MCG
    CALCITRIOL 0.25 mcg
    CALCITRIOL CAP .5 MCG
    CEFAZOLIN SODIUM 1 GM
    CEFTAZIDIME INJ 1 GM
    CINACALCET HCL 30 MG
    CINACALCET HCL 60 MG
    CINACALCET HCL 90 MG
    LEVOFLOXACIN TAB 500 MG
    Zemplar Cap 1 MCG

    You data here is quite a bit different than what you have on SQLFiddle.  Can you provide a stable target, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is the fiddle http://sqlfiddle.com/#!18/334d74/1/0

  •  

    Generic Formula to Get Position of First Number in String:

    =MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},String_Ref&"0123456789")

     

    http://www.dgcustomerfirst.com

    • This reply was modified 3 years, 5 months ago by  Hettinger.
  • Hettinger wrote:

    Generic Formula to Get Position of First Number in String:

    =MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},String_Ref&"0123456789")

    In which dialect of SQL?  That will not work in T-SQL (SQL Server).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Any solution will be difficult due to variations in format, however this may work except for Heparin as I cannot discern which value is which and also it will ignore any rows that do not fit a pattern.

    SELECT LEFT(mt.MedDescription,PATINDEX(p.Pattern,mt.MedDescription)-1)
    ,SUBSTRING(mt.MedDescription,PATINDEX(p.Pattern,mt.MedDescription)+1,p.Number)
    ,SUBSTRING(mt.MedDescription,PATINDEX(p.Pattern,mt.MedDescription)+1+p.Number,200)
    FROM MedTable mt
    JOIN (VALUES
    ('% [0-9][0-9][0-9][a-z][a-z][a-z] %',3),
    ('% [0-9][0-9][a-z][a-z][a-z] %',2),
    ('% [0-9].[0-9][0-9][a-z][a-z][a-z] %',4),
    ('% [0-9][a-z][a-z][a-z] %',1),
    ('% .[0-9][a-z][a-z][a-z] %',2),
    ('% [0-9][a-z][a-z] %',1),
    ('% [0-9][0-9][a-z][a-z] %',2),
    ('% [0-9][0-9][0-9][a-z][a-z] %',3)
    ) p (Pattern,Number)
    ON mt.MedDescription LIKE p.Pattern

    p.s. This solution relies on the description having a trailing space

    • This reply was modified 3 years, 5 months ago by  David Burrows.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • sqldevlearn wrote:

    Here is the fiddle http://sqlfiddle.com/#!18/334d74/1/0

    If you've not solved this yet, especially with the special condition for things like "30_000" where the underscore is actually a space, and no one else has solved it,  I'll try to get to this this weekend.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Missed David's post, which he posted while I was still writing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok.  First, I cleaned up the code that provides the test table and data generation so it's easier to read/verify.  I also changed it from a table variable to a temp table so that it's easier to develop from because the table persists.  Of course, I used my favorite temp table name for "head- pounder" problems. 😀

    Notice the extra row I added at the end.

       DROP TABLE IF EXISTS #MyHead;
    CREATE TABLE #MyHead
    (Name VARCHAR(50))
    ;
    INSERT INTO #MyHead WITH (TABLOCK)
    (Name)
    VALUES ('ARANESP 100MCG PFS SOLD BY THE')
    ,('ARANESP 200MCG PFS SOLD BY THE')
    ,('ARANESP 40MCG PFS SOLD BY THE')
    ,('ARANESP 60MCG PFS SOLD BY THE')
    ,('CALCITRIOL 0.25mcg CAPSULE (EA')
    ,('CALCITRIOL CAP .5MCG 100/BT"SO')
    ,('CEFAZOLIN SODIUM 1GM EACH MMS')
    ,('CEFTAZIDIME INJ 1GM SDV "EACHE')
    ,('CINACALCET HCL 30MG 30/BT SLAT')
    ,('CINACALCET HCL 60MG 30/BT 100')
    ,('CINACALCET HCL 90MG 30/BT')
    ,('EPOGEN 10 000/ML MDV 2ML "EA"')
    ,('EPOGEN 20 000 U/ML MDV 1ML"EA"')
    ,('EPOGEN 3000U/ML 1ML "EA" SOLD')
    ,('EPOGEN 4000U/ML 1ML "EA" SOLD')
    ,('GENTAMICIN 0.1% CREAM 15GM 1/E')
    ,('GENTAMICIN OPT SOL 5ML EACH')
    ,('Heparin Sod Inj USP 30 000 uni')
    ,('Heparin Sodium Injection USP 1')
    ,('LEVOFLOXACIN TAB 500MG "SOLD A')
    ,('LIDOCAINE/PRILOCAINE 2.5% 30GM')
    ,('MUPIROCIN CREAM 2% 30GM')
    ,('VENOFER 20MG/ML 5ML (100MG) EA')
    ,('WATER STERILE FTV 10ML "EA" 25')
    ,('Zemplar Cap 1MCG 30/BT EA')
    ,('A test with no numeric digits') --Added this one
    ;

    With the idea of doing a pattern search (and we could actually do one more level of improvement but this is OK for a demo), there are only 3 patterns that we're dealing with, so far.

    EPOGEN 20 000 U/ML MDV 1ML"EA"
    ^ ^
    | |--- Find2: Begininng of second "possible" word (NULL if not present)
    |
    |------ Find1: Beginning of first word.

    To summarize...
    (Note: order below for ease of understanding and is different than the order in the code for performance reasons)

    If there are no words with digits, return the whole string.

    If the first word with a digit ends with a digit and the next word
    starts with a digit, return both words and everthing to the left of the first word.
    We also remove the space between the two words.

    Otherwise, return the first word with a digit and all the words to the left of it.



    Here's code for that.  You could shorten the conditions in the CASE since CASE is evaluated from the top down for each condition until a condition matches, but that would be confusing to even some people that know that (especially me before coffee).

    WITH cteFind AS
    (
    SELECT Name = CONVERT(CHAR(51),Name)
    ,Find1 = PATINDEX('%[0-9]%',CONVERT(CHAR(51),Name))
    ,Find2 = NULLIF(PATINDEX('%[0-9][ ][0-9]%',CONVERT(CHAR(51),Name)),0)+2
    FROM #MyHead
    )
    SELECT Original = Name
    ,Desired = CASE
    WHEN Find2 IS NULL AND Find1 > 0 THEN LEFT(Name,CHARINDEX(' ',Name,Find1)-1)
    WHEN Find2 > 0 AND Find1 > 0 THEN STUFF(LEFT(Name,CHARINDEX(' ',Name,Find2)-1),Find2-1,1,'')
    WHEN Find1 = 0 THEN Name
    ELSE '***** Unknown Condition *****'
    END
    ,Find1
    ,Find2
    FROM cteFind
    ;

    I can see a couple of other conditions coming up but this satisfies the latest problem definition and makes it pretty easy to easily add others.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply