Parsing Non-Standard String into a Standard Format

  • I'm working on parsing out strings into a standard format. I have provided the examples of strings below.

    create table #TestString(OldString Varchar(50), NewString Varchar(50))

    insert into #TestString (OldString)

    values ('333-546982'),

    ('2-333-452147'),

    ('^333-439112-232'),

    ('3:231-665456'),

    ('9:222-ts-123432-7')

    I would like to strings to always be in a format {3}-{6}

    For Example,

    2-333-452147 = 333-452147

    9:222-ts-123432-7 = 222-123432

    Thanks!

  • I have a reasonable idea of what you're trying to do... What are the rules for removing non-numeric characters in the string, though?

    What specifically do you mean by "standard" strings? Could you explain what patterns you are trying to remove? All non-numeric characters? Any specific punctuation marks?

    This is getting close. It would be better if I could use a table to specify the list of characters to remove.

    SELECT REPLACE(REPLACE(REPLACE(OldString,'-',''),'^',''),':','') AS NewString

    FROM #TestString;

  • Quick suggestion that looks first for 3 digits and then 6 digits in the remainder of the string

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'tempdb..#TestString') IS NOT NULL DROP TABLE #TestString;

    create table #TestString(OldString Varchar(50), NewString Varchar(50))

    insert into #TestString (OldString)

    values ('333-546982'),

    ('2-333-452147'),

    ('^333-439112-232'),

    ('3:231-665456'),

    ('9:222-ts-123432-7');

    ;WITH BASE_DATA AS

    (

    SELECT

    TS.OldString

    ,PATINDEX('%[0-9][0-9][0-9]%',TS.OldString) AS FIRST_POS

    ,SUBSTRING(TS.OldString,(PATINDEX('%[0-9][0-9][0-9]%',TS.OldString) + 3),8000) AS STR_REMAINDER

    FROM #TestString TS

    )

    SELECT

    BD.OldString

    ,SUBSTRING(BD.OldString,BD.FIRST_POS,3)

    + CHAR(45)

    +SUBSTRING(BD.STR_REMAINDER,PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9]%',BD.STR_REMAINDER),6) AS NUM_STRING

    FROM BASE_DATA BD;

    Output

    OldString NUM_STRING

    ------------------- ----------

    333-546982 333-546982

    2-333-452147 333-452147

    ^333-439112-232 333-439112

    3:231-665456 231-665456

    9:222-ts-123432-7 222-123432

    Edit: removed extra commas in the query.

  • This is fantastic, thank you!

  • joshdbguy (7/11/2016)


    This is fantastic, thank you!

    You are very welcome.

    😎

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply