Still having errors !!!!!!!!!!!!!!!!!!!!

  • I have data that I'm trying to clean and split the string Actual_Extract and get the Type,Code and Amount

    1)

    When I manipulate the ACTUAL_TEXT result I'm still getting the wrong result on some rows. For Example...See below

    '01 759 9991 997 -0.560' Gives the right result = 759 ,it has 3 spaces

    '01 759 9991 997 -0.360' Incomplete just gives a 9 instead of 759 , it has one space

    '01 759 9991 997 -0.160' Incomplete just gives a 9 instead of 759 , it has one space

    '01 759 9991 997 -0.660' Gives the right result = 759,it has 3 spaces

    The spaces are not consistent. They could 1,2,3 etc..Tried replacing the double or more spaces

    with replace(field,Space(2),space(1)) etc..Too many replace statements

    2) Remove the non-numeric Values from the when extracting the Amount. i.e EE,CO,CB etc

    Currently using the replace function. Problem is the non numeric value could be anything

    Is there a way to remove it without using too many replace functions ?

    '-20.000 EE01 6060 3 997' output = -20.000 EE , it should be -20.000

    '-20.000 01 6060 3 997' output = -20.000 CB , it should be -20.000

    3)

    If actual_text = '0997 2247' the type should be NULL,Amount Should be NULL as well as Code because they is

    no monetary Value in the String

    4)

    This data set is over 60 million rows, how do I insert into tbPROCESSED in batches ?

    5)

    Lastly if Actual_text is as listed below. Type,Amount & Code should be NUll

    '33CR RT CHG .000 - 2.000'

    '33CR RT CHG .000 - 2.000'

    '33TI RT CHG 1.000 - .000'

    '33TI RT CHG 1.000 - .000'

  • Ray,

    First, I don't see any rows in your test data that look like the following:

    '-20.000 EE 01 6060 3 997' output = -20.000 EE , it should be -20.000

    '-20.000 01 6060 3 997' output = -20.000 CB , it should be -20.000

    Second, in the rows that look like the following, please identify what the Type, Code, and Amount would be...

    '01 759 9991 997 -0.560'

    '01 6060 3 997 -40.000 CB'

    Third, are we allowed to add columns to the tbExtract table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes sure you can add columns. Is it for Cleaning the data ?

    Actual_Text is currently = '01 759 9991 997 -0.560'

    Desired output = Type should be '01'

    , Code should be 759 and Amount should be -0.560 after extracting it from Actual_Text

    Actual_Text is currently = '01 6060 3 997 -40.000 CB'

    = Type should be '01'

    , Code should be 6060 and Amount should be -40.000.

    Actual_Text is currently = '01 6060 3 997 -50.000 CB'

    = Type should be '01'

    , Code should be 6060 and Amount should be -50.000.

  • Yes sure you can add columns. Is it for Cleaning the data ?

    Yes... and for storing it, too. The following works on the data you provided for the tbExtract table. I would not try to update 60 million rows in a single pass... I'd probably build a loop to use the code below 60 times or so to split the updates to only about a million rows.

    ALTER TABLE tbExtract

    ADD Type VARCHAR(50),

    Code VARCHAR(50),

    Unused1 VARCHAR(50),

    Amount VARCHAR(50),

    Unused2 VARCHAR(50)

    GO

    DECLARE @Dummy VARCHAR(50)

    DECLARE @String VARCHAR(50)

    UPDATE tbExtract

    SET @String = Actual_Text+' ',

    @Dummy = Type = LEFT(@String,PATINDEX('%[0-9] %',@String)),

    @String = SUBSTRING(@String,PATINDEX('% [0-9]%',@String)+1,50),

    @Dummy = Code = LEFT(@String,PATINDEX('%[0-9] %',@String)),

    @String = SUBSTRING(@String,PATINDEX('% [0-9]%',@String)+1,50),

    @Dummy = Unused1 = LEFT(@String,PATINDEX('%[0-9] %',@String)),

    @String = SUBSTRING(@String,PATINDEX('% [0-9]%',@String)+1,50),

    @Dummy = Unused2 = LEFT(@String,PATINDEX('%[0-9] %',@String)),

    @String = SUBSTRING(@String,PATINDEX('% [-0-9]%',@String)+1,50),

    @Dummy = Amount = LEFT(@String,PATINDEX('%[0-9] %',@String))

    WHERE Actual_Text NOT LIKE '%[a-z][a-z][a-z]%'

    AND Actual_Text LIKE '[0-9][0-9] %'

    SELECT * FROM tbEXTRACT

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s. Code above assumes all "Amounts" are negative... may have to make a second pass for positive "Amounts".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes there are both negative and positive values in the monetary column.....

  • Ok, Ray... I've changed the code above... minor tweak... should do what you want now... give 'er a try on a test table like the one you posted...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks a lot it works....

  • Cool... thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I seem to be having challenges that are text related..

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply