Pattern Matching

  • Run script in the attachement to generate the data and view the output

    SELECT * FROM tbPROCESSED

    This is the full picture of what I'm trying to do.The problematic field is called Actual_Text which is the string that requires manipulation.

    I'm deriving three fields from field Actual_Text

    fields Type,Code & MonetoryValue

    Field Type is the digits starting from the left and end when it encounters a space

    only where there is a Valid monetoryValue

    ie. '80 98437 3 997 -2.500'

    in this case Type will be = 80

    ie. 'PARK AVENUE MANOR Pr RAZOO 0'

    in this case Type will be = null

    ie. '530719000254'

    in this case Type will be = null

    -----------------------------Code

    Field Code is the digits after the first space starting from the left and ends when it encounters the

    second space ,only where there is a Valid monetoryValue

    ie. '80 98437 3 997 -2.500'

    Code will be = 98437

    ie. 'PARK AVENUE MANOR Pr RAZOO 0'

    Code will be = null

    ie. '530719000254'

    Code will be = null

    ie. '01 6060 3 997 -20.000 CO'

    Code will be = 6060

    ie. '0997 2247'

    Code will be = null because there is no MonetoryValue

    ---MonetoryValue

    I can extract the monetory value only problem is ho do I remove

    the alpha characters without using replace

    i.e '01 6060 3 997 -20.000 CO'

    '-20.000 CO' to exclude CO etc...

    '01 6060 3 997 -50.000 CB'

  • I hope I understood you right, but I think this code should work.

    But be ware that in you sample data there are 3 paces between the monetary values and the code, so I assume in my script that this will be the case.

    SELECT ACTUAL_TEXT,

    CASE

    WHEN LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT))Like '[1-9]%' THEN

    LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT) )

    ELSE ''

    END

    AS MonetaryValue,

    CASE

    WHEN

    LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT))Like '[1-9]%' THEN

    SUBSTRING(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT)+3,

    PATINDEX('%[^0-9]%',RIGHT(ACTUAL_TEXT,(LEN(ACTUAL_TEXT) - PATINDEX('%[^0-9]%',Actual_TEXT)-3))))

    ELSE ''

    END as code

    FROM [dbo].[tbEXTRACT]

    Markus

    [font="Verdana"]Markus Bohse[/font]

  • Thanks,

    I changed it slightly and it now picks up a few more. problem is that

    the spaces are not always 3 spaces sometimes its 2 sometimes 3 sometimes 1 space.

    I have put comments in the spreadsheet. Thanks for your help man at least I'm begining to

    get the picture now.

    SELECT ACTUAL_TEXT,

    CASE

    WHEN LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',ACTUAL_TEXT))Like '[0-9]%' THEN

    LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',ACTUAL_TEXT) )

    ELSE ''

    END

    AS [Type],

    CASE

    WHEN

    LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',ACTUAL_TEXT))Like '[0-9]%' THEN

    SUBSTRING(ACTUAL_TEXT,PATINDEX('%[^0-9]%',ACTUAL_TEXT)+3,

    PATINDEX('%[^0-9]%',RIGHT(String,(LEN(ACTUAL_TEXT) - PATINDEX('%[^0-9]%',ACTUAL_TEXT)-3))))

    WHEN

    LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',ACTUAL_TEXT))Like '[0-9]%' THEN

    SUBSTRING(ACTUAL_TEXT,PATINDEX('%[^0-9]%',ACTUAL_TEXT)+2,

    PATINDEX('%[^0-9]%',RIGHT(ACTUAL_TEXT,(LEN(ACTUAL_TEXT) - PATINDEX('%[^0-9]%',ACTUAL_TEXT)-2))))

    ELSE ''

    END AS Code

    FROM tbEXTRACT

  • Why are you still fighting this problem, Ray? I gave you an answer and code that resolves all of these problems last night...

    http://www.sqlservercentral.com/Forums/Topic425054-149-1.aspx#bm425122

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There were no spaces in your original sample data. If that continues to be a problem, let me know... the fix is simple...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry about that man. I didn't realise they were no spaces in the data that I sent last night. There spaces and thats why maybe its doesn't strip out all the data.. You have been a great help...Can you help me take care of those spaces....

  • Roger that... fix is simple... same code except where highlighted... will take care of up to 16 spaces between elements...

    SELECT s.RowID,

    Type = MAX(CASE WHEN s.Posit =1 AND LEN(s.SplitString) =2 THEN s.SplitString ELSE NULL END),

    Code = MAX(CASE WHEN s.Posit =2 AND LEN(s.SplitString)<=6 THEN s.SplitString ELSE NULL END),

    MonetaryValue = MAX(CASE WHEN s.Posit =5 THEN s.SplitString ELSE NULL END)

    FROM (--==== Derived table "s" splits all of the strings and marks the position (ordinal index) of each

    SELECT h.RowID,

    SplitString = SUBSTRING(' '+h.String+' ', t.N+1, CHARINDEX(' ', ' '+h.String+' ', t.N+1)-t.N-1),

    Posit = t.N-LEN(REPLACE(LEFT(' '+h.String+' ',t.N), ' ', ''))

    FROM dbo.Tally t

    RIGHT OUTER JOIN --Necessary in case String is NULL

    (SELECT RowID,

    String = REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    String

    ,' ',' ')

    ,' ',' ')

    ,' ',' ')

    ,' ',' ')

    FROM #Test) h

    ON SUBSTRING(' '+h.String+' ', t.N, 1) = ' '

    AND t.N < LEN(' '+h.String+' ')

    AND h.String NOT LIKE '%[_]%'

    )s

    WHERE s.Posit IN (1,2,5)

    GROUP BY s.RowID

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Couldn't combine your last script with the rest of the script I have so I used the this. How can I combine it Should I decide to use it in the one select statement I have. Ilike the (ordinal index) part as it enhances performance

    Thank you very much its beginning to make sense now. Just make sure that I understand how the code works. Sorry for backtracking I just want to ensure that

    I understand the code so that next time I know what I'm doing...

    Please confirm if I'm on the right track.

    1.To extract the Type. You are checking from the left if the characters are numeric. If so its extracting that. Because the space is

    not numeric it stops there and extracts just the numeric data before the space else it returns a space

    2. With the MonetoryValue is there a way of excluding the alpha characters without using the replace functions ?

    3. if I don't specify in the where clause :- WHERE ACTUAL_TEXT != 'M.A.N_SUPERMKT KI NG WILLIAM D'

    or AND ACTUAL_TEXT NOT LIKE '%[_]%'

    I get the error listed below.

    Msg 536, Level 16, State 5, Line 5

    Invalid length parameter passed to the SUBSTRING function.

    Is there a way around this without the where clause because I want to return all the results

    and rather have nulls in the funny data

    SELECT

    CAST(RTRIM(LTRIM(INST_NO)) AS CHAR(3)) AS INST_NO

    -- Remove - 0 at the end of the acct_no

    ,REPLACE(CAST(RTRIM(LTRIM(ACCT_NO)) AS VARCHAR(16)),'-','') AS ACCT_NO

    ,CAST(RTRIM(LTRIM(REC_NO)) AS VARCHAR(9 )) AS REC_NO

    -- Get The date format Required

    ,CONVERT(VARCHAR(10),TRANS_DATE,120) AS TRANS_DATE

    ,CONVERT(VARCHAR(10),POST_DATE,120) AS POST_DATE

    ,CONVERT(INT,POST_TIME) AS POST_TIME

    ,CONVERT(VARCHAR(60),TRANS_DESC) AS TRANS_DESC

    -- Extract the Type

    ,CASE

    WHEN LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT))LIKE '[0-9]%' THEN

    LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT) )

    ELSE ''

    END

    AS [TYPE],

    -- Please add a comment to explain the +3 and the -3 ??????????????????

    CASE

    WHEN

    LEFT(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT))LIKE '[0-9]%' THEN

    SUBSTRING(ACTUAL_TEXT,PATINDEX('%[^0-9]%',Actual_TEXT)+3,

    PATINDEX('%[^0-9]%',RIGHT(ACTUAL_TEXT,(LEN(ACTUAL_TEXT) - PATINDEX('%[^0-9]%',Actual_TEXT)-3))))

    ELSE ''

    END AS CODE

    ,MONETORYVALUE=

    CASE

    WHEN CHARINDEX('.',REVERSE(ACTUAL_TEXT)) = 0 THEN NULL ELSE

    REVERSE(SUBSTRING(REVERSE(ACTUAL_TEXT), CHARINDEX('.',REVERSE(ACTUAL_TEXT))+1,

    CHARINDEX(' ',REVERSE(ACTUAL_TEXT),CHARINDEX('.',REVERSE(ACTUAL_TEXT))) - CHARINDEX('.',REVERSE(ACTUAL_TEXT)))) END +

    CASE

    WHEN CHARINDEX('.',ACTUAL_TEXT) =0 THEN NULL ELSE

    REPLACE(REPLACE(REPLACE(REPLACE(SUBSTRING(ACTUAL_TEXT,CHARINDEX('.',ACTUAL_TEXT),LEN(ACTUAL_TEXT) -CHARINDEX('.',ACTUAL_TEXT)+1)

    ,'CB',''),'FC',''),'CO',''),'EE','')

    END

    ,LTRIM(RTRIM(ACTUAL_TEXT)) AS ACTUAL_TEXT

    FROM

    dbo.tbEXTRACT

    WHERE ACTUAL_TEXT NOT LIKE '%[_]%' ---!= 'M.A.N_SUPERMKT KI NG WILLIAM D'

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply