Phone Number Validation

  • N.B. - Friday, June 29, 2018 12:51 PM

    Greetings,

    I'm working on a project where we have to take a list of unique phone numbers in our database and scrub them against the National DNC list - that part isn't in question here.

    For legal reasons, we have to use a third-party vendor that maintains the DNC list - we upload our list, they scrub, and then send back.  However, they charge per-number to scrub, which means that it is in our best interests to remove invalid phone numbers (e.g. 9999999999) from the list before sending.

    So while I can write obvious rules for numbers like the above, there are thousands more variations that aren't valid - either due to the customer mis-inputting, or the Agent doing so or whatever - and my manual list would quickly grow out of control.

    Does someone have, or is aware of, a function, etc. that does basic checks against a 10-digit phone number to help identify ones that are invalid?  I realize that some are still going to slip through, but even if I were to identify 10%, that would reduce our costs significantly.  I'm not looking to make sure that the number is active, etc. - rather that it's even valid - so that something like 1119992222 is removed.

    And yes - we keep track of the numbers we've already scrubbed so we're not re-sending them.  Our weekly volume is 500K+ new numbers.

    Shifting gears back to the original post....

    How much are you good folks paying per phone number and approximately how much per year (or week) are you paying for this service?  And, no... NOT asking just because I'm curious.  😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher - Tuesday, July 3, 2018 11:07 AM

    Once you've sent a 9-digit number to them, and they've validated it, you shouldn't need to send them that exact same number again.  That is, keep a cache of the numbers already verified and don't send those again.  You might want to put a time limit on the validation, so that after, say, nn months, you send them again, "just in case".

    To a certain extent that will be true, the third party may also return a status for the rejected numbers and that status may affect if you want to resend it at a later date.

    I suspect anything rejected because the user does not wish to receive unsolicited calls would be unlikely to change status - at least until the person moves and the number is reallocated.
    Other reasons may or may not change over time.
    Its less certain that you can assume that a number which is accepted will continue to be so but if you know how frequently the DNC database is updated you should be able to assume the status will be unchanged between updates.

  • DNC database is updated you should be able to assume the status will be unchanged between updates.

    And as it's updated you need to re-validate all the numbers you have validated before.

    Because you can never know which numbers were deactivated since the last update, and which ones have been added/reactivated since then.

    Which makes the exercise of saving statuses of already processed numbers quite useless.

    Any kind of status recorded a month ago may be irrelevant today.

    Especially at times of VoIP telephones.

    _____________
    Code for TallyGenerator

  • Looks to follow the standard pretty well from where I'm sitting

    You dial 81044 to call UK 0r 8101 for the US. I had already pointed out the 0 is to access code or in your case 810 - this is also why its usually written as +44 or +1

    Then the area code - yep that follows for both

    Then the subscriber number - looks the same to me

    Max of 15 digits - yep that too.

    Yes the access code varies by network operator or more generally country but you will be calling from likely a very few probably one so that's going to be known. The area code again will apply to that country - I had already stated it is unlikely you would want to validate those on international numbers so again it should be straightforward to validate against them.

    I'm certainly not proposing a validation function covering every country - tailor it to your own requirements. Just that there are standards in place.

    Let's have a look on those standards:

    Access code can start with any digit and may contain from 1to 3 digits,

    International country code may contain from 1 to 4 digits,

    Both codes may by present - or not - in any particular number;

    Area codes contain unpredictable number of any kind of digits;

    Leading zero on area code may be included or not in a number;

    Subscriber number - any kind of digits in any order, with total number from 5 to 9 (as far as I know);

    Total number of digits <15 is too loose for countries like NZ, would be quite useless for validation purposes, and would cause issues with numbers with extensions in huge megapolices.

    Well, so much of a standard.

    I can see how easily you can build a function implementing these standards, and how useful it would be for actual number validation.

    Jeff's complains about unreliability of ISNUMERIC would look totally ridiculous.

    _____________
    Code for TallyGenerator

  • Sergiy - Wednesday, July 4, 2018 5:38 AM

    Looks to follow the standard pretty well from where I'm sittingYou dial 81044 to call UK 0r 8101 for the US. I had already pointed out the 0 is to access code or in your case 810 - this is also why its usually written as +44 or +1Then the area code - yep that follows for bothThen the subscriber number - looks the same to meMax of 15 digits - yep that too.Yes the access code varies by network operator or more generally country but you will be calling from likely a very few probably one so that's going to be known. The area code again will apply to that country - I had already stated it is unlikely you would want to validate those on international numbers so again it should be straightforward to validate against them.I'm certainly not proposing a validation function covering every country - tailor it to your own requirements. Just that there are standards in place.

    Let's have a look on those standards:Access code can start with any digit and may contain from 1to 3 digits,International country code may contain from 1 to 4 digits,Both codes may by present - or not - in any particular number;Area codes contain unpredictable number of any kind of digits;Leading zero on area code may be included or not in a number;Subscriber number - any kind of digits in any order, with total number from 5 to 9 (as far as I know);Total number of digits <15 is too loose for countries like NZ, would be quite useless for validation purposes, and would cause issues with numbers with extensions in huge megapolices.Well, so much of a standard.I can see how easily you can build a function implementing these standards, and how useful it would be for actual number validation.Jeff's complains about unreliability of ISNUMERIC would look totally ridiculous.

    Access code is known for your network - this is trivial to check
    Country codes are a known list - again simple to check
    Area codes are a known list for your country - yet again simple to check - I had already stated that I saw little value trying to validate outside your own country for most users
    Subscriber number - I had already stated - leave validation of that to the third party - I would expect the length to be the same for all numbers within any national dialing plan however.

    Simple enough to validate against and good enough to reject any with an invalid country or area code.

    Keep it simple. Validate sufficiently for your own needs rather than trying to cover every possible umber on the planet.

  • Jeff Moden - Tuesday, July 3, 2018 11:49 AM

    N.B. - Friday, June 29, 2018 12:51 PM

    Greetings,

    I'm working on a project where we have to take a list of unique phone numbers in our database and scrub them against the National DNC list - that part isn't in question here.

    For legal reasons, we have to use a third-party vendor that maintains the DNC list - we upload our list, they scrub, and then send back.  However, they charge per-number to scrub, which means that it is in our best interests to remove invalid phone numbers (e.g. 9999999999) from the list before sending.

    So while I can write obvious rules for numbers like the above, there are thousands more variations that aren't valid - either due to the customer mis-inputting, or the Agent doing so or whatever - and my manual list would quickly grow out of control.

    Does someone have, or is aware of, a function, etc. that does basic checks against a 10-digit phone number to help identify ones that are invalid?  I realize that some are still going to slip through, but even if I were to identify 10%, that would reduce our costs significantly.  I'm not looking to make sure that the number is active, etc. - rather that it's even valid - so that something like 1119992222 is removed.

    And yes - we keep track of the numbers we've already scrubbed so we're not re-sending them.  Our weekly volume is 500K+ new numbers.

    Shifting gears back to the original post....

    How much are you good folks paying per phone number and approximately how much per year (or week) are you paying for this service?  And, no... NOT asking just because I'm curious.  😉

    Seriously... I know of a free and relatively easy way to validate all active area codes and exchanges for all NANPA (North American Numbering Plan Association).  I'm just trying to figure out whether or not it's worth the diversion for both of us.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • crmitchell - Wednesday, July 4, 2018 6:00 AM

    Access code is known for your network - this is trivial to check
    Country codes are a known list - again simple to check
    Area codes are a known list for your country - yet again simple to check - I had already stated that I saw little value trying to validate outside your own country for most users
    Subscriber number - I had already stated - leave validation of that to the third party - I would expect the length to be the same for all numbers within any national dialing plan however.

    Simple enough to validate against and good enough to reject any with an invalid country or area code.

    Keep it simple. Validate sufficiently for your own needs rather than trying to cover every possible umber on the planet.

    "Your network", "your country" - those are the key words here.
    International conventions you were referring to have no use here.

    See what I said in my first post in this thread.
    Every area has its own rules.

    _____________
    Code for TallyGenerator

  • Sergiy - Wednesday, July 4, 2018 7:13 AM

    crmitchell - Wednesday, July 4, 2018 6:00 AM

    Access code is known for your network - this is trivial to check
    Country codes are a known list - again simple to check
    Area codes are a known list for your country - yet again simple to check - I had already stated that I saw little value trying to validate outside your own country for most users
    Subscriber number - I had already stated - leave validation of that to the third party - I would expect the length to be the same for all numbers within any national dialing plan however.

    Simple enough to validate against and good enough to reject any with an invalid country or area code.

    Keep it simple. Validate sufficiently for your own needs rather than trying to cover every possible umber on the planet.

    "Your network", "your country" - those are the key words here.
    International conventions you were referring to have no use here.

    See what I said in my first post in this thread.
    Every area has its own rules.

    The rules are the same.

    Access code to make an international call again is a value the rule remains the same. It appears first. By convention it is written as + rather than the actual value. If you follow that convention then even the value is the same. - If this is not present then assume the number is a national or local number.

    Each country will have its own list of area codes - that's a difference of values not a different rule they still appear after the country code and before the subscriber number

    As for the national dialing plan - the US does differ slightly from other countries as it uses a region code after the area code but that could be handled by assuming its part of the subscriber number - validate against it if you wish.

    At the end of it all validation of phone numbers down to the level of the area code is a straightforward parsing exercise.

    As a preprocessing step I would recommend converting any alphabetic characters to numeric as per your phone keypad and stripping out any others except a possible + in the first position. If you don't live somewhere that likes to use words as phone numbers (is there anywhere other than the US that regularly does this?) then you may choose to strip out alphas as well. - That step on its own is likely to filter out a lot of invalid numbers.

    If you want the list of validation rules - here it is
    If the access code is present - either the explicit value for your locale or a + then match the next part against the list of country codes

    If  country code is valid and its not your own country code stop validation at this point and pass to the third party

    If there is no access code or if the country code matches your country validate the first ( or next) part against your countries area codes
    If this matches pass to the third party
    If this does not match then the number is either a local number or its invalid

    If the length of the number does not match that applicable to your locale then its invalid
    If the length does match then assume its a local number - I doubt if a third party will validate that as they won't know which area code it applies to.

  • A meaningful validation must implement all constraints like these:

    https://en.m.wikipedia.org/wiki/Telephone_numbers_in_Australia

    That's for Australia.

    Nothing to do with any other country.

    _____________
    Code for TallyGenerator

  • Jeff Moden - Wednesday, July 4, 2018 7:10 AM

    Jeff Moden - Tuesday, July 3, 2018 11:49 AM

    N.B. - Friday, June 29, 2018 12:51 PM

    Greetings,

    I'm working on a project where we have to take a list of unique phone numbers in our database and scrub them against the National DNC list - that part isn't in question here.

    For legal reasons, we have to use a third-party vendor that maintains the DNC list - we upload our list, they scrub, and then send back.  However, they charge per-number to scrub, which means that it is in our best interests to remove invalid phone numbers (e.g. 9999999999) from the list before sending.

    So while I can write obvious rules for numbers like the above, there are thousands more variations that aren't valid - either due to the customer mis-inputting, or the Agent doing so or whatever - and my manual list would quickly grow out of control.

    Does someone have, or is aware of, a function, etc. that does basic checks against a 10-digit phone number to help identify ones that are invalid?  I realize that some are still going to slip through, but even if I were to identify 10%, that would reduce our costs significantly.  I'm not looking to make sure that the number is active, etc. - rather that it's even valid - so that something like 1119992222 is removed.

    And yes - we keep track of the numbers we've already scrubbed so we're not re-sending them.  Our weekly volume is 500K+ new numbers.

    Shifting gears back to the original post....

    How much are you good folks paying per phone number and approximately how much per year (or week) are you paying for this service?  And, no... NOT asking just because I'm curious.  😉

    Seriously... I know of a free and relatively easy way to validate all active area codes and exchanges for all NANPA (North American Numbering Plan Association).  I'm just trying to figure out whether or not it's worth the diversion for both of us.

    I don't have the exact costs off hand, but I do know that there's a corporate contract that allows for X numbers per month; and if one of the divisions exceeds that they have to pay additional and our cost was set at approximately $1,800/mth.

    We do keep a record of all the numbers that we've already sent, and what date they were sent through, so that we're only sending "new" numbers.

    The ultimate problem that I'm seeing is that I've got thousands of numbers that appear to be invalid - either via the area code or the NXX.  Trying to validate the subscriber line portion is impossible and I realize that - I'm not trying to validate that portion.

    As previously stated, these numbers are stored as a VARCHAR(10), so there's no access code to validate - even if it is an international phone number, there is simply not enough room to store all of the valid digits, so we would want to scrub it anyway.

  • Sergiy - Wednesday, July 4, 2018 2:41 PM

    A meaningful validation must implement all constraints like these:https://en.m.wikipedia.org/wiki/Telephone_numbers_in_AustraliaThat's for Australia.Nothing to do with any other country.

    This is for the US only, and the number is stored as a varchar(10) - so concerns about international numbers are moot.

  • N.B. - Wednesday, July 4, 2018 2:46 PM

    Jeff Moden - Wednesday, July 4, 2018 7:10 AM

    Jeff Moden - Tuesday, July 3, 2018 11:49 AM

    N.B. - Friday, June 29, 2018 12:51 PM

    Greetings,

    I'm working on a project where we have to take a list of unique phone numbers in our database and scrub them against the National DNC list - that part isn't in question here.

    For legal reasons, we have to use a third-party vendor that maintains the DNC list - we upload our list, they scrub, and then send back.  However, they charge per-number to scrub, which means that it is in our best interests to remove invalid phone numbers (e.g. 9999999999) from the list before sending.

    So while I can write obvious rules for numbers like the above, there are thousands more variations that aren't valid - either due to the customer mis-inputting, or the Agent doing so or whatever - and my manual list would quickly grow out of control.

    Does someone have, or is aware of, a function, etc. that does basic checks against a 10-digit phone number to help identify ones that are invalid?  I realize that some are still going to slip through, but even if I were to identify 10%, that would reduce our costs significantly.  I'm not looking to make sure that the number is active, etc. - rather that it's even valid - so that something like 1119992222 is removed.

    And yes - we keep track of the numbers we've already scrubbed so we're not re-sending them.  Our weekly volume is 500K+ new numbers.

    Shifting gears back to the original post....

    How much are you good folks paying per phone number and approximately how much per year (or week) are you paying for this service?  And, no... NOT asking just because I'm curious.  😉

    Seriously... I know of a free and relatively easy way to validate all active area codes and exchanges for all NANPA (North American Numbering Plan Association).  I'm just trying to figure out whether or not it's worth the diversion for both of us.

    I don't have the exact costs off hand, but I do know that there's a corporate contract that allows for X numbers per month; and if one of the divisions exceeds that they have to pay additional and our cost was set at approximately $1,800/mth.

    We do keep a record of all the numbers that we've already sent, and what date they were sent through, so that we're only sending "new" numbers.

    The ultimate problem that I'm seeing is that I've got thousands of numbers that appear to be invalid - either via the area code or the NXX.  Trying to validate the subscriber line portion is impossible and I realize that - I'm not trying to validate that portion.

    As previously stated, these numbers are stored as a VARCHAR(10), so there's no access code to validate - even if it is an international phone number, there is simply not enough room to store all of the valid digits, so we would want to scrub it anyway.

    Just so you know, a subscription to the U.S. Do Not Call Registry is only $16.8 K per year for all area codes and, once you buy it, you can use it a billion times if you need to.  The $1,800/month number you quote is $21.6K per year.  You could save $4.8K per year by becoming a subscriber yourself.

    If you just want to validate 10 Character NANPA numbers for NPA and NXX, then get the free download for all currently active NPA/NXX combinations from the following link.  See the "All States" listing and related zip file.  You also find the file format and column name definition on that same page.  Read the section about the "Usage" codes.  Not all active NPA/NXXs are available to the general public and the codes in this column will allow you to determine that.  For example, nnn555 is a protected NXX in most area codes.   The list does not include the 800 (and related) series of numbers but that's probably OK for your usage because if it's not on the list, you don't want to submit it anyway.

    https://www.nationalnanpa.com/reports/reports_cocodes_assign.html

    If you need help importing the data, let me know and I'll write one for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah... what the heck.  I kind of take things for granted including some of the goofy stuff NANPA did with the data in that download.  It's been that way forever and I just forgot about some of the issues that I originally had that others may also have.

    For example, they made it not only Tab Delimited (that's actually a good thing) but they also made it "Fixed Width" by adding a bunch of trailing spaces where you might not expect it and they used the Linux "Row Terminator" of just a line feed (CHAR(10) in the ASCII world).  They also made the heading just a little different from the data rows and someone trying to optimize all that for perfect fits with no trailing spaces (which can be done) will run into a bit of a problem unless they have a deep understanding of both the data and BCP format files (not saying you don't but I don't know if you do nor if you're allowed to store a BCP format file either with the datafile or on the server).  I also figure all of that is overkill for what you're trying to do.

    With that thought in mind, we'll go the simple, brute force route.  Create the following tables (change the names if you want or need to)...


     CREATE TABLE dbo.NanpaInfo
            (
             [State] CHAR(3)       
            ,NPA_NXX CHAR(7)
            ,OCN CHAR(4)
            ,Company VARCHAR(60)
            ,RateCenter CHAR(11)
            ,EffectiveDate DATE
            ,UseType CHAR(2)
            ,AssignedDate DATE
            ,IG CHAR(2)
            ,IsPooled CHAR(2)
            ,IsUpdate CHAR(2)
            ,CONSTRAINT PK_NanpaInfo PRIMARY KEY CLUSTERED (NPA_NXX)
            )
    ;
     CREATE TABLE dbo.NpaNxx
            (
             NpaNxx CHAR(6)
            ,CONSTRAINT PK_NpaNxx PRIMARY KEY CLUSTERED (NpaNxx)
            )
    ;

    ... then run the following code to populate the NanpaInfo table from the downloaded and unzipped "allutlzd.txt" (making sure you change the path to where you stored the file and, yes, it can be a UNC)...


       BULK INSERT dbo.NanpaInfo
       FROM 'C:\Temp\allutlzd\allutlzd.txt'
       WITH (
             FIELDTERMINATOR    = '\t' --This indicates a TAB character
            ,ROWTERMINATOR      = '0x0a' --This is how to do just a line-feed for a row terminator
            ,FIRSTROW           = 2  --Skips the header
            ,CODEPAGE           = 'RAW'  --Next 2 settings are "friends" of mine that have kept me out of trouble.
            ,DATAFILETYPE       = 'char'
            ,TABLOCK  --Can performance if minimal logging is possible.  Does hurt if not so I have the habit of always adding it for this type of thing.
            )
    ;

    ... and, finally, run the following to produce a well indexed, single column lookup table with only valid/assigned NpaNxx's to use to check your 500k row files against.  Again, things like the TABLOCK, ORDER BY, and OPTION(RECOMPILE) are old friends that don't hurt but can really help with minimal logging and performance if conditions are right in the database.


     INSERT INTO dbo.NpaNxx WITH (TABLOCK)
     SELECT NpaNxx = REPLACE(NPA_NXX,'-','')
       FROM dbo.NanpaInfo
      WHERE UseType = 'AS'
      ORDER BY NpaNxx
     OPTION (RECOMPILE)
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply