how to break phone and exension data

  • I'm looking for an sql function that can brean following phone string and retun phone no and extension independently. not all r here but i have some similarity of these:

    there are almost 500K phone contacts like this which i have to fix in their respective fields individually

    i.e.

    select Udf_GetPhone('913.135.9000 x:6390') will return 913.135.9000

    select Udf_GetExt('913.135.9000 x:6390') will return 6390

    following is some same:

    913.135.9000 x:6390

    (305)529-6911x2101

    (206)965-2626

    109.391.3261

    201 933-3101 ext 6696

    (919) 333-9395 x252

    190-325-3121 Ext. 5096

    (313) 196-1000, ex. 63936

    (901) 123-3990 ext. 1209

    305.263.0311

    223 123 1922

    301-191-3255

    619 932 6660

    (939)651-3631

    913-931-9300

    609-665-5312

    512.322.1396

    (101) 261-1911 x 6033

    316-361-5900 ext 12021

    223-123-1105

    903-202-3339 x23031

    631-261-5500

    (950) 993-1151....9999

    939-599-2100 x11200

    915-939-5500, x2139

    Shamshad Ali.

  • Is the phone number (without the extension) always the same number of figures, ignoring brackets, full stops, etc?

    It looks like it from the sample data you hae provided.

    If so I would remove all non-numeric characters, then use Left() to pick up the phone number - the remainder would be the extension

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • Hi Ali,

    Hope this may help you.

    DECLARE @VARIABLE AS VARCHAR(50)

    SET @VARIABLE = UPPER('(901) 123-3990 ext. 1209')

    --SET @VARIABLE = UPPER('(305)529-6911x2101')

    --SET @VARIABLE = UPPER('(313) 196-1000, ex. 63936')

    IF(CHARINDEX('EXT',@VARIABLE) > 0)

    SELECT REPLACE(SUBSTRING(@VARIABLE, 0, CHARINDEX('EXT',@VARIABLE)), ',', '') AS PHONENUMBER

    , REPLACE(SUBSTRING(@VARIABLE, CHARINDEX('EXT',@VARIABLE) + 3, LEN(@VARIABLE)),'.','') AS EXTENSION

    ELSE IF(CHARINDEX('EX',@VARIABLE) > 0)

    SELECT REPLACE(SUBSTRING(@VARIABLE, 0, CHARINDEX('EX',@VARIABLE)),',','') AS PHONENUMBER

    , REPLACE(SUBSTRING(@VARIABLE, CHARINDEX('EX',@VARIABLE) + 2, LEN(@VARIABLE)),'.','') AS EXTENSION

    ELSE IF(CHARINDEX('X',@VARIABLE) > 0)

    SELECT REPLACE(SUBSTRING(@VARIABLE, 0, CHARINDEX('X',@VARIABLE)),',','') AS PHONENUMBER

    , REPLACE(SUBSTRING(@VARIABLE, CHARINDEX('X',@VARIABLE) + 1, LEN(@VARIABLE)),'.','') AS EXTENSION

    I have used substring, charindex and replace function to get your result.

    Regards,

    Ashok S

  • Thanks - Highly appreciated your help.

    when i use this 512.322.1396, the logic don't return anything. also same with 713-326-XXXX as input it should return only numbers

    could u plz see this and also i would like if it always return phone number in one standard format format like

    7138754885

    Shamshad Ali.

  • I'd use something like this in your UDF:

    declare @val varchar(50)

    declare @p varchar(10) -- phone number

    declare @e varchar(10) -- extension

    set @val = '(950) 993-1151....9999'

    WHILE PATINDEX('%[^0-9]%', @val) > 0

    SET @val = REPLACE(@val, SUBSTRING(@val,PATINDEX('%[^0-9]%', @val),1),'')

    select @val

    set @p = left(@val, 10)

    select @p

    set @e = substring(@val, 11, len(@val) - 10)

    select @e

    As I said before though, this depends on the phone number always being made up of the same number of digits

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • here's how i would do it.

    I'm assuming that you are using US format for area codes..(800) xxx-yyyy ext.zzzz

    some of your area code exampels started with 1, i.e. 109, which does not really exist.

    i'm assumming everything can be calculated based on the string length..

    it may not be obvious, but the end results, you could take any of the cases that assume 'ext' in them, and have that in your separate UDF for extensions.

    hope this gets the ideas accross.

    here's what i did.

    two functions..one to strip non-numeric chars, and another to format to my pattern.

    --assuming you already have a tally table:

    CREATE FUNCTION StripNonNumeric(@OriginalText VARCHAR(8000))

    RETURNS VARCHAR(8000)

    BEGIN

    DECLARE @CleanedText VARCHAR(8000)

    SELECT @CleanedText = ISNULL(@CleanedText,'') +

    CASE

    --ascii numbers are 48(for '0') thru 57 (for '9')

    WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57 THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END

    FROM dbo.Tally WHERE Tally.N <= LEN(@OriginalText)

    RETURN @CleanedText

    END

    GO

    CREATE FUNCTION [FormatPhone] (@PhoneString varchar(50))

    RETURNS varchar(50)

    AS

    BEGIN

    DECLARE @FinalString varchar(50)

    SET @FinalString = ''

    --strip out any existing formatting:

    SELECT @PhoneString = dbo.StripNonNumeric(@PhoneString)

    IF LEFT(@PhoneString,1) = '1' -- "1-(800)" format

    BEGIN

    SET @PhoneString = SUBSTRING(@PhoneString,2,50)

    SET @FinalString = '1-'

    END

    SELECT @FinalString =

    CASE

    --less than 7 characters? just return the string

    WHEN LEN(@PhoneString) < 7

    THEN @FinalString + @PhoneString

    --exactly 7 xxx-xxxx

    WHEN LEN(@PhoneString) = 7

    THEN @FinalString + STUFF(@PhoneString,4,0,'-')

    --less than 10 xxx-xxxx ext yy

    WHEN LEN(@PhoneString) < 10

    THEN @FinalString + STUFF(STUFF(@PhoneString,4,0,'-') ,9,0,' ext ')

    --exactly 10 (yyy) xxx-xxxx

    WHEN LEN(@PhoneString) = 10

    THEN @FinalString + STUFF(STUFF(STUFF(@PhoneString,4,0,') '),1,0,' ('),10,0,'-')

    --more than 10 (yyy) xxx-xxxx ext y+

    WHEN LEN(@PhoneString) > 10

    THEN @FinalString + STUFF(STUFF(STUFF(STUFF(@PhoneString,4,0,') '),1,0,' ('),11,0,'-'),16,0,' ext. ')

    END

    RETURN @FinalString

    END

    GO

    and here's a sample data and script to show you the results:

    declare @PhoneNumbers table(Num varchar(50))

    insert into @PhoneNumbers

    SELECT '1-800 427-4567 extension 456780' UNION ALL

    SELECT '470' UNION ALL

    SELECT '555-4561 x 1' UNION ALL

    SELECT '913.135.9000 x:6390' UNION ALL

    SELECT '(305)529-6911x2101' UNION ALL

    SELECT '(206)965-2626' UNION ALL

    SELECT '109.391.3261' UNION ALL

    SELECT '201 933-3101 ext 6696' UNION ALL

    SELECT '(919) 333-9395 x252' UNION ALL

    SELECT '190-325-3121 Ext. 5096' UNION ALL

    SELECT '(313) 196-1000, ex. 63936' UNION ALL

    SELECT '(901) 123-3990 ext. 1209' UNION ALL

    SELECT '305.263.0311' UNION ALL

    SELECT '223 123 1922' UNION ALL

    SELECT '301-191-3255' UNION ALL

    SELECT '619 932 6660' UNION ALL

    SELECT '(939)651-3631' UNION ALL

    SELECT '913-931-9300' UNION ALL

    SELECT '609-665-5312' UNION ALL

    SELECT '512.322.1396' UNION ALL

    SELECT '(101) 261-1911 x 6033' UNION ALL

    SELECT '316-361-5900 ext 12021' UNION ALL

    SELECT '223-123-1105' UNION ALL

    SELECT '903-202-3339 x23031' UNION ALL

    SELECT '631-261-5500' UNION ALL

    SELECT '(950) 993-1151....9999' UNION ALL

    SELECT '939-599-2100 x11200' UNION ALL

    SELECT '915-939-5500, x2139'

    SELECT num,dbo.StripNonNumeric(Num),dbo.FormatPhone(Num) from @PhoneNumbers

    and here are my results:

    Orig Stripped Formatted

    1-800 427-4567 exten 18004274567456780 1- (800) 427-4567 ext. 456780

    470 470 470

    555-4561 x 1 55545611 555-4561 ext 1

    913.135.9000 x:6390 91313590006390 (913) 135-9000 ext. 6390

    (305)529-6911x2101 30552969112101 (305) 529-6911 ext. 2101

    (206)965-2626 2069652626 (206) 96-52626

    109.391.3261 1093913261 1-093-9132 ext 61

    201 933-3101 ext 669 20193331016696 (201) 933-3101 ext. 6696

    (919) 333-9395 x252 9193339395252 (919) 333-9395 ext. 252

    190-325-3121 Ext. 50 19032531215096 1- (903) 253-1215 ext. 096

    (313) 196-1000, ex. 313196100063936 (313) 196-1000 ext. 63936

    (901) 123-3990 ext. 90112339901209 (901) 123-3990 ext. 1209

    305.263.0311 3052630311 (305) 26-30311

    223 123 1922 2231231922 (223) 12-31922

    301-191-3255 3011913255 (301) 19-13255

    619 932 6660 6199326660 (619) 93-26660

    (939)651-3631 9396513631 (939) 65-13631

    913-931-9300 9139319300 (913) 93-19300

    609-665-5312 6096655312 (609) 66-55312

    512.322.1396 5123221396 (512) 32-21396

    (101) 261-1911 x 603 10126119116033 1- (012) 611-9116 ext. 033

    316-361-5900 ext 120 316361590012021 (316) 361-5900 ext. 12021

    223-123-1105 2231231105 (223) 12-31105

    903-202-3339 x23031 903202333923031 (903) 202-3339 ext. 23031

    631-261-5500 6312615500 (631) 26-15500

    (950) 993-1151....99 95099311519999 (950) 993-1151 ext. 9999

    939-599-2100 x11200 939599210011200 (939) 599-2100 ext. 11200

    915-939-5500, x2139 91593955002139 (915) 939-5500 ext. 2139

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • ok i think i'm wrong about area codes between 100 and 199(starting with 1), so you might want to remove the logic i put in place to assume starting with 1 means the 1-(xxx) yyy-zzzz ext aaaaa format

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (3/20/2009)


    ok i think i'm wrong about area codes between 100 and 199(starting with 1), so you might want to remove the logic i put in place to assume starting with 1 means the 1-(xxx) yyy-zzzz ext aaaaa format

    Nah... you were correct. The first area code for public use is 201. Still, I wouldn't get phone number validation mixed in with cleaning up phone numbers. In fact, if I were cleaning them for my company, I certainly wouldn't clean them just to have them as a 10 digit blob of numbers with separators... I'd split them into 3, 3, 4, + digits for NPA, NXX, Line, and Extension. It's actually easier to do that than to clean them up and reformat them. The formatting should probably be done on the GUI side of the house.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok Shamshad... you've got almost 400 points to your name. That means you've been around long enough to know how to post data for sample data to make it easier on us. If you don't know how, then please study the article at the link in my signature below and study the following code because that's the best way to post sample data like what you have. Why? Because we can use it to test our own code so you get a better answer, quicker. 😉

    Here's the test data/table...

    SELECT '913.135.9000 x:6390' AS FullPhoneNumber

    INTO #OriginalTable UNION ALL

    SELECT '(305)529-6911x2101' UNION ALL

    SELECT '(206)965-2626' UNION ALL

    SELECT '109.391.3261' UNION ALL

    SELECT '201 933-3101 ext 6696' UNION ALL

    SELECT '(919) 333-9395 x252' UNION ALL

    SELECT '190-325-3121 Ext. 5096' UNION ALL

    SELECT '(313) 196-1000, ex. 63936' UNION ALL

    SELECT '(901) 123-3990 ext. 1209' UNION ALL

    SELECT '305.263.0311' UNION ALL

    SELECT '223 123 1922' UNION ALL

    SELECT '301-191-3255' UNION ALL

    SELECT 'My new phone # is Area Code 637, my number is 5551515, and I''m at extension 1234. Got that?' UNION ALL

    SELECT 'Yep... and mine is (842) number 444-1212.' UNION ALL

    SELECT '619 932 6660' UNION ALL

    SELECT '(939)651-3631' UNION ALL

    SELECT '913-931-9300' UNION ALL

    SELECT '609-665-5312 Extension 2-8-4444 'UNION ALL

    SELECT '512.322.1396' UNION ALL

    SELECT '(101) 261-1911 x 6033' UNION ALL

    SELECT '316-361-5900 ext 12021' UNION ALL

    SELECT '223-123-1105' UNION ALL

    SELECT '903-202-3339 x23031' UNION ALL

    SELECT '631-261-5500' UNION ALL

    SELECT '(950) 993-1151....9999' UNION ALL

    SELECT '939-599-2100 x11200' UNION ALL

    SELECT '915-939-5500, x2139'

    Notice that I threw in a couple of huge curve balls on the data... if you're going to clean something pretty much standardized like a phone number (in the "NANPA" areas, always 10 digits followed by an extension or not), you might as well expect the unexpected when it comes to stuff you might have to remove. This does NOT, however, validate the phone numbers or strip leading 1's... that should be done by something else.

    Now, I suppose you can use a UDF to make life easier in the future... but, I wanna show you how to clean the whole bloody table all at once... and, yes, you do need a Tally table for the following code...

    SELECT FullPhoneNumber,

    t.N AS DigitPosition,

    SUBSTRING(FullPhoneNumber,t.N,1) AS Digit

    INTO #CleanTable

    FROM dbo.Tally t

    CROSS JOIN #OriginalTable yt

    WHERE t.N <= LEN(FullPhoneNumber)

    AND SUBSTRING(FullPhoneNumber,t.N,1) LIKE '[0-9]'

    SELECT c1.FullPhoneNumber,

    STUFF(

    STUFF(

    LEFT(

    (

    SELECT ''+c2.Digit

    FROM #CleanTable c2

    WHERE c1.FullPhoneNumber = c2.FullPhoneNumber

    ORDER BY c2.DigitPosition

    FOR XML PATH('')

    )

    ,10) --Left 10 of all NANPA phone numbers for NPA NXX LINE

    ,7,0,'.') --Add period between NXX and LINE

    ,4,0,'.') --Add period between NPA and NXX

    AS CleanNumber,

    SUBSTRING(

    (

    SELECT ''+c2.Digit

    FROM #CleanTable c2

    WHERE c1.FullPhoneNumber = c2.FullPhoneNumber

    ORDER BY c2.DigitPosition

    FOR XML PATH('')

    )

    ,11,1000)

    AS CleanExtension

    FROM #CleanTable c1

    GROUP BY c1.FullPhoneNumber

    I use a Temp table as interim storage instead of a CTE because the CTE would have to be referenced at least 3 times... that also means that the CTE would be executed 3 times.

    And, if you need to know how a Tally table works and how to build it correctly, take a look at the following article...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    If you want to see other articles about splitting and rebuilding parameters and the like, just click on my name in the Tally table article header.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nope... bad move on my part... it's going to have to be a function because there's no guarantee that you won't have duplicate phone numbers after the split and that makes for some pretty ugly and long (not to mention, wrong, extensions). I'll be back in a minute.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... again... here's the test data...

    drop table #OriginalTable

    go

    SELECT '913.135.9000 x:6390' AS FullPhoneNumber

    INTO #OriginalTable UNION ALL

    SELECT '(305)529-6911x2101' UNION ALL

    SELECT '(206)965-2626' UNION ALL

    SELECT '109.391.3261' UNION ALL

    SELECT '201 933-3101 ext 6696' UNION ALL

    SELECT '(919) 333-9395 x252' UNION ALL

    SELECT '190-325-3121 Ext. 5096' UNION ALL

    SELECT '(313) 196-1000, ex. 63936' UNION ALL

    SELECT '(901) 123-3990 ext. 1209' UNION ALL

    SELECT '305.263.0311' UNION ALL

    SELECT '223 123 1922' UNION ALL

    SELECT '301-191-3255' UNION ALL

    SELECT 'My new phone # is Area Code 637, my number is 5551515, and I''m at extension 1234. Got that?' UNION ALL

    SELECT 'Yep... and mine is (842) number 444-1212.' UNION ALL

    SELECT '619 932 6660' UNION ALL

    SELECT '(939)651-3631' UNION ALL

    SELECT '913-931-9300' UNION ALL

    SELECT '609-665-5312 Extension 2-8-4444 'UNION ALL

    SELECT '512.322.1396' UNION ALL

    SELECT '512.322.1396' UNION ALL

    SELECT '(101) 261-1911 x 6033' UNION ALL

    SELECT '316-361-5900 ext 12021' UNION ALL

    SELECT '223-123-1105' UNION ALL

    SELECT '903-202-3339 x23031' UNION ALL

    SELECT '631-261-5500' UNION ALL

    SELECT '(950) 993-1151....9999' UNION ALL

    SELECT '939-599-2100 x11200' UNION ALL

    SELECT '915-939-5500, x2139'

    Here's the functions...

    CREATE FUNCTION dbo.Udf_GetPhone

    (@FullPhoneNumber VARCHAR(200))

    RETURNS CHAR(12)

    AS

    BEGIN

    DECLARE @CleanPhone VARCHAR(200)

    SELECT @CleanPhone = ''

    SELECT @CleanPhone = @CleanPhone + SUBSTRING(@FullPhoneNumber,t.N,1)

    FROM dbo.Tally t

    WHERE t.N <= LEN(@FullPhoneNumber)

    AND SUBSTRING(@FullPhoneNumber,t.N,1) LIKE '[0-9]'

    RETURN (SELECT STUFF(STUFF(LEFT(@CleanPhone,10),7,0,'.'),4,0,'.'))

    END

    GO

    CREATE FUNCTION dbo.Udf_GetExtension

    (@FullPhoneNumber VARCHAR(200))

    RETURNS CHAR(12)

    AS

    BEGIN

    DECLARE @CleanPhone VARCHAR(200)

    SELECT @CleanPhone = ''

    SELECT @CleanPhone = @CleanPhone + SUBSTRING(@FullPhoneNumber,t.N,1)

    FROM dbo.Tally t

    WHERE t.N <= LEN(@FullPhoneNumber)

    AND SUBSTRING(@FullPhoneNumber,t.N,1) LIKE '[0-9]'

    RETURN (SUBSTRING(@CleanPhone,11,20))

    END

    And here's how to use them...

    SELECT FullPhoneNumber,

    dbo.Udf_GetPhone(FullPhoneNumber) AS CleanPhoneNumber,

    dbo.Udf_GetExtension(FullPhoneNumber) AS CleanExtension

    FROM #OriginalTable

    It may be even faster if you make a table valued function to return both the phone number and the extension on a single row and do a CROSS APPLY... I won't do that, though, because, like I said, I'd split it down to the NPA/NXX/LINE/Extension columns.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello Friends,

    Infront of u biggiesss i m too small to write antyhing but ,,, still i tried and i got some ansewer....... here is the code .....

    I have considered in any case first ten digit will be the phone number.........

    DECLARE @VARIABLE AS VARCHAR(50)

    SET @VARIABLE = UPPER('913.135.9000 x:6390')

    DECLARE @I INT

    SET @I=LEN(@VARIABLE)

    PRINT @I

    WHILE @I<>0

    BEGIN

    IF SUBSTRING(@VARIABLE,@I,1) NOT BETWEEN '0' AND '9'

    BEGIN

    SET @VARIABLE=REPLACE(@VARIABLE,SUBSTRING(@VARIABLE,@I,1),' ')

    END

    SET @I=@I-1

    END

    SET @VARIABLE=REPLACE(@VARIABLE,' ','')

    --style 1

    SELECT SUBSTRING(@VARIABLE,1,10) AS PHONE_NUMBER,

    SUBSTRING(@VARIABLE,11,LEN(@VARIABLE)) AS EXT_NUMBER

    ---style 2

    SELECT '('+SUBSTRING(@VARIABLE,1,3)+')'+'('+SUBSTRING(@VARIABLE,4,3)+')'+'('+SUBSTRING(@VARIABLE,7,4)+')' AS PHONE_NUMBER,

    SUBSTRING(@VARIABLE,11,LEN(@VARIABLE)) AS EXT_NUMBER

    --style 3

    SELECT SUBSTRING(@VARIABLE,1,3)+'.'+SUBSTRING(@VARIABLE,4,3)+'.'+SUBSTRING(@VARIABLE,7,4) AS PHONE_NUMBER,

    SUBSTRING(@VARIABLE,11,LEN(@VARIABLE)) AS EXT_NUMBER

    just need to copy and run the code , try changing the values to the @variable string.....

    let me know the bugs my script have.....

    thanks @ regards,

    Mithun

  • Hey Shamshad... any feedback on this?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply