Changing 1 or + in a telephone number to the correct format

  • Hi everyone
    I have various records in a database that contain US telephone numbers in different formats. Some are correctly formatted (e.g. 001 123 456 7890) whereas others are incorrect e.g. 1 123 456 7890 or +1 123 456 7890.

    I'd like to find a way to update the incorrectly formatted numbers using SQL.

    Would I need to copy the numbers to a new table, amend them and then copy them back, or is there an easier method?

    Many thanks
    Jon

  • j.clay 47557 - Monday, October 29, 2018 3:12 AM

    Hi everyone
    I have various records in a database that contain US telephone numbers in different formats. Some are correctly formatted (e.g. 001 123 456 7890) whereas others are incorrect e.g. 1 123 456 7890 or +1 123 456 7890.

    I'd like to find a way to update the incorrectly formatted numbers using SQL.

    Would I need to copy the numbers to a new table, amend them and then copy them back, or is there an easier method?

    Many thanks
    Jon

    Given that only 1 and +1 need to replaced as 001. You can try the below logic


    create table telephone
    (
    telephone_number varchar(200)
    );
    insert into telephone values ('+1 1234567890');
    insert into telephone values ('1 1234567890');
    insert into telephone values ('001 1234567890');

    update telephone set telephone_number=case when len(telephone_number)=14 then telephone_number
    else concat('001 ',right(telephone_number,10)) end

    Saravanan

  • Brilliant, thank you 🙂

  • What about the spaces between area code, exchange, and line that were originally posted?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, October 29, 2018 6:51 AM

    What about the spaces between area code, exchange, and line that were originally posted?

    Good point. Also, the above logic applied to a correctly format number (such as the example 001 123 456 7890) would end up being incorrect with the above code; the 10 most right characters of that string is 3 456 7890 (so the 12 is lost at the beginning is lost).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Do you know if the incoming numbers will always have 4 groupings with a single space between them?

    Or might you get numbers like
    1-800-123-4567
    1(800)123-4567
    1 (800) 123-4567
    (800) 123-4567
    8001234567
    etc.

    If you don't know how many spaces there might be, if there's a leading 1 or +1 or 001, or other conditions, I'd first strip out all spaces, dashes and parenthesis using REPLACE multiple times.  Then you know the rightmost 10 characters are the telephone number.  You can then append 001 to the beginning of those 10 characters, and add in spaces between the numbers as needed.

    This falls apart if you ever get a phone number without an area code.  Or if one of the digits are missing.  But bad data is bad data.

  • gvoshol 73146 - Tuesday, October 30, 2018 5:45 AM

    Do you know if the incoming numbers will always have 4 groupings with a single space between them?

    Or might you get numbers like
    1-800-123-4567
    1(800)123-4567
    1 (800) 123-4567
    (800) 123-4567
    8001234567
    etc.

    If you don't know how many spaces there might be, if there's a leading 1 or +1 or 001, or other conditions, I'd first strip out all spaces, dashes and parenthesis using REPLACE multiple times.  Then you know the rightmost 10 characters are the telephone number.  You can then append 001 to the beginning of those 10 characters, and add in spaces between the numbers as needed.

    This falls apart if you ever get a phone number without an area code.  Or if one of the digits are missing.  But bad data is bad data.

    ... it also falls apart if extensions are used or the numbers come from outside of NANPA specifications (the North American Numbering Plan).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I suggest generating a pattern to identity what the phone string contains.  Then all you need to do is decide which patterns are acceptable and which aren't.

    For example, use a 9 to represent a digit, a D to represent a delimiter (space, dot or dash) (naturally if you prefer, you could use B for blank and another char to represent a non-blank delim), and +() to represent themselves.


    ;WITH
    cteTally10 AS (
         SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cteTally100 AS (
         SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
         FROM cteTally10 c1
         CROSS JOIN cteTally10 c2
    )
    SELECT *
    FROM #data d
    CROSS APPLY (
         SELECT ((SELECT
             CASE WHEN SUBSTRING(phone_number, t.number, 1) LIKE '[0-9]' THEN '9'
                WHEN SUBSTRING(phone_number, t.number, 1) LIKE '[ .-]' THEN 'D'
                WHEN SUBSTRING(phone_number, t.number, 1) LIKE '[+()]' THEN
                SUBSTRING(phone_number, t.number, 1)
              ELSE '?' END + ''
        FROM cteTally100 t
        WHERE t.number BETWEEN 1 AND LEN(phone_number)
        FOR XML PATH(''))) AS phone_pattern
    ) AS alias1

    /*

    CREATE TABLE #data
    (
      phone_number varchar(30) NOT NULL
    )
    INSERT INTO #data VALUES
    ('001 123 456 7890'),
    ('1 123 456 7890'),
    ('+1 123 456 7890'),
    ('1-800-123-4567'),
    ('1(800)123-4567'),
    ('1 (800) 123-4567'),
    ('(800) 123-4567'),
    ('8001234567')

    */

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Jeff Moden - Monday, October 29, 2018 6:51 AM

    What about the spaces between area code, exchange, and line that were originally posted?

    I thought space is a additional burden when it comes in between telephone numbers and space may take at at least 1 byte for blank space. so I intentionally removed it.

    Saravanan

  • saravanatn - Wednesday, October 31, 2018 1:34 AM

    Jeff Moden - Monday, October 29, 2018 6:51 AM

    What about the spaces between area code, exchange, and line that were originally posted?

    I thought space is a additional burden when it comes in between telephone numbers and space may take at at least 1 byte for blank space. so I intentionally removed it.

    But the original data has spaces in it and you've not accounted for that in the code, which means the original data will be truncated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • May I say something really offensive?
    Please don't beat me too hard...

    Data normalization.

    Store country code, area code, local number separately from each other.

    j.clay 47557 - Monday, October 29, 2018 3:12 AM

    Hi everyone
    I have various records in a database that contain US telephone numbers in different formats. Some are correctly formatted (e.g. 001 123 456 7890) whereas others are incorrect e.g. 1 123 456 7890 or +1 123 456 7890.

    I'd like to find a way to update the incorrectly formatted numbers using SQL.

    Would I need to copy the numbers to a new table, amend them and then copy them back, or is there an easier method?

    Many thanks
    Jon

    Those numbers are formatted correctly, just not in accordance with the international calling rules of the country you have in mind.

    But the number you label "correctly formatted" might not be such, if it's dialed from another country.

    For example, the call the US number +1 123 456 7890 from New Zealand you need to dial the sequence you named "correct": 0011234567890.
    But to call the same number from Australia you need to dial this sequence: 001111234567890.
    Calling the same number from Russia would require following sequence: 8 {pause for dial tome}1011234567890

    So, keep the parts of the number separated, and build an appropriate dialing sequence on user's request, according to the rules specified for the current user's location.

    _____________
    Code for TallyGenerator

  • Sergiy - Thursday, November 1, 2018 6:07 AM

    May I say something really offensive?
    Please don't beat me too hard...

    Data normalization.

    Store country code, area code, local number separately from each other.

    j.clay 47557 - Monday, October 29, 2018 3:12 AM

    Hi everyone
    I have various records in a database that contain US telephone numbers in different formats. Some are correctly formatted (e.g. 001 123 456 7890) whereas others are incorrect e.g. 1 123 456 7890 or +1 123 456 7890.

    I'd like to find a way to update the incorrectly formatted numbers using SQL.

    Would I need to copy the numbers to a new table, amend them and then copy them back, or is there an easier method?

    Many thanks
    Jon

    Those numbers are formatted correctly, just not in accordance with the international calling rules of the country you have in mind.

    But the number you label "correctly formatted" might not be such, if it's dialed from another country.

    For example, the call the US number +1 123 456 7890 from New Zealand you need to dial the sequence you named "correct": 0011234567890.
    But to call the same number from Australia you need to dial this sequence: 001111234567890.
    Calling the same number from Russia would require following sequence: 8 {pause for dial tome}1011234567890

    So, keep the parts of the number separated, and build an appropriate dialing sequence on user's request, according to the rules specified for the current user's location.

    What makes it even more fun are the different "standards". In a great many countries, telephone numbers include a country code of 1 to 3 digits, a city code of 0 to 5 digits, and then a line code. As Sergiy points out, there may be addition prefixes and, last I'd seen was that telephone numbers can be up to 21 digits without those prefixes or extensions.  Then, certain countries like Mexico (unless they've changed it more than a decade ago) use a thing called "Banding", which is quite different.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'd argue the version starting with the + is the "correct" format - that should be valid for calling from anywhere in the world. By convention the international access code is replaced by a + when writing an international number
    for the UK this is 0 and as Sergiy has indicated for the US and New Zealand it is 00, for Australia it is 0011 for Russia 8 etc.
    Most if not all countries will allow you to dial a national number without the international code so I would expect any dataset would be likely to include those without the international code. Again you may be able to dial a local number without the area code so again those may be present.
    This will matter if you need to call those numbers from outside the US and if you won't ever need to why store the international prefix anyway?

    Also have you converted any numbers with alpha representations to their numeric equivalents? I've only seen this used for US numbers but advertisers seen to love it there.
    I also presume you don't need to handle operator service numbers including the emergency operator which will be shorter.

    I'd also argue that you would be better stripping out any non numeric  characters except + when storing it - the telephone exchange will anyway so its just a convention to make them easier to read and that convention may vary depending on culture so handle that level of formatting in your UI doing this will reduce your storage requirements and also help identify cases where you have duplicate but differently formatted numbers.

    0123 456 7890
    01 23 45678 9 0
    01234567890

    are all the same telephone number but will not be picked up as such from simple string comparisons.

    To add to the fun and games ROI mobile numbers also come with a voicemail number which is identical except for one extra digit making them 12 digits rather than 11 for the national number.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply