How to break zipcodes fileds with leading zeros

  • Jeffrey Williams wrote:

    ScottPletcher wrote:

    According to your sample data and results, the "Answer" you marked is not correct.  For example, you stated that '10000' should return a 1 -- "Anything between 00000 to 10000 THEN 1" -- but that code returns a 2.

    I assumed that the OP actually wants categories for 00000 thru 09999, 10000 through 19999 - which makes more sense than including 10000 in category 1, 20000 in category 2, etc.

    With that said - I still would recommend building a category table instead of calculating the value.

    Why?  If what you're saying is true, then just use the first char in the value as the category, that's less overhead than any kind of table lookup.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    Why?  If what you're saying is true, then just use the first char in the value as the category, that's less overhead than any kind of table lookup.

    Because the users will want a description of each category - not some numeric value that means nothing to them.  Yes - it is easy to calculate the value but that value has no meaning to the users.

    If you don't have a table with those values, then every developer - whether that is in SQL or in the application code - will convert the values and there will be no consistency.

    If you use a mathematical expression or just take the first character, then what do you do when the users want to break out the 90000 thru 99999 into separate categories?  But - they don't need 20000 thru 29999 broken out into separate categories?  Do you then alter every piece of code where you used this calculation - and every piece of code where the developer converted the code to a description?

    And what if your system needs to be able to track US, Canada and Mexico postal codes - and provide different categories for those?  You can expand a table to include a country code - how do you do that in code and account for alphanumeric Canadian postal codes?

    So yes, my recommendation is to build a lookup table.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    ScottPletcher wrote:

    Why?  If what you're saying is true, then just use the first char in the value as the category, that's less overhead than any kind of table lookup.

    Because the users will want a description of each category - not some numeric value that means nothing to them.  Yes - it is easy to calculate the value but that value has no meaning to the users.

    If you don't have a table with those values, then every developer - whether that is in SQL or in the application code - will convert the values and there will be no consistency.

    If you use a mathematical expression or just take the first character, then what do you do when the users want to break out the 90000 thru 99999 into separate categories?  But - they don't need 20000 thru 29999 broken out into separate categories?  Do you then alter every piece of code where you used this calculation - and every piece of code where the developer converted the code to a description?

    And what if your system needs to be able to track US, Canada and Mexico postal codes - and provide different categories for those?  You can expand a table to include a country code - how do you do that in code and account for alphanumeric Canadian postal codes?

    So yes, my recommendation is to build a lookup table.

    No... that's what parameters in stored procedures are used for. 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeffrey Williams wrote:

    ScottPletcher wrote:

    According to your sample data and results, the "Answer" you marked is not correct.  For example, you stated that '10000' should return a 1 -- "Anything between 00000 to 10000 THEN 1" -- but that code returns a 2.

    I assumed that the OP actually wants categories for 00000 thru 09999, 10000 through 19999 - which makes more sense than including 10000 in category 1, 20000 in category 2, etc.

    With that said - I still would recommend building a category table instead of calculating the value.

    If you look at the example case statements the op provided, you assumption is correct.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeffrey Williams wrote:

    ScottPletcher wrote:

    Why?  If what you're saying is true, then just use the first char in the value as the category, that's less overhead than any kind of table lookup.

    Because the users will want a description of each category - not some numeric value that means nothing to them.  Yes - it is easy to calculate the value but that value has no meaning to the users.

    If you don't have a table with those values, then every developer - whether that is in SQL or in the application code - will convert the values and there will be no consistency.

    If you use a mathematical expression or just take the first character, then what do you do when the users want to break out the 90000 thru 99999 into separate categories?  But - they don't need 20000 thru 29999 broken out into separate categories?  Do you then alter every piece of code where you used this calculation - and every piece of code where the developer converted the code to a description?

    And what if your system needs to be able to track US, Canada and Mexico postal codes - and provide different categories for those?  You can expand a table to include a country code - how do you do that in code and account for alphanumeric Canadian postal codes?

    So yes, my recommendation is to build a lookup table.

    We are a shipping logistics company.  So naturally we deal with zips A LOT.  U.S. shipping zones are typically based off the first 3 zip digits, like 100 (NYC) to 891 (Las Vegas).  Thus, only using 1 char seems like a quick, local thing.  I'm not doing a ton of work to go beyond that until I know for sure the business needs it.

    As to CAN, where are you going to get your full list of CAN zip codes to put into the table?  How then how will you keep that table current?  That applies to the U.S. for that matter.

    In summary, if requirements get more sophisticated, I would use a table, but it would be based on prefix / pattern, not just looking up every zip code.  To me, that's just not practical.  We cache the actual zip results we find rather than trying to pre-load zip-related tables.  There's just too many combinations of zips (some of which don't make sense and thus aren't really needed) to pre-load tables with that data.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    We are a shipping logistics company.  So naturally we deal with zips A LOT.  U.S. shipping zones are typically based off the first 3 zip digits, like 100 (NYC) to 891 (Las Vegas).  Thus, only using 1 char seems like a quick, local thing.  I'm not doing a ton of work to go beyond that until I know for sure the business needs it.

    As to CAN, where are you going to get your full list of CAN zip codes to put into the table?  How then how will you keep that table current?  That applies to the U.S. for that matter.

    In summary, if requirements get more sophisticated, I would use a table, but it would be based on prefix / pattern, not just looking up every zip code.  To me, that's just not practical.  We cache the actual zip results we find rather than trying to pre-load zip-related tables.  There's just too many combinations of zips (some of which don't make sense and thus aren't really needed) to pre-load tables with that data.

    I think we are saying the same thing - just from a different point of view.  I stated originally that if the OP needed any descriptive values available then I would recommend using a table.  I still recommend using a table if there are additional requirements.

    If this is a one-time grouping used only in this code - then no need for a table.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    ScottPletcher wrote:

    Why?  If what you're saying is true, then just use the first char in the value as the category, that's less overhead than any kind of table lookup.

    Because the users will want a description of each category - not some numeric value that means nothing to them.  Yes - it is easy to calculate the value but that value has no meaning to the users.

    If you don't have a table with those values, then every developer - whether that is in SQL or in the application code - will convert the values and there will be no consistency.

    If you use a mathematical expression or just take the first character, then what do you do when the users want to break out the 90000 thru 99999 into separate categories?  But - they don't need 20000 thru 29999 broken out into separate categories?  Do you then alter every piece of code where you used this calculation - and every piece of code where the developer converted the code to a description?

    And what if your system needs to be able to track US, Canada and Mexico postal codes - and provide different categories for those?  You can expand a table to include a country code - how do you do that in code and account for alphanumeric Canadian postal codes?

    So yes, my recommendation is to build a lookup table.

    I strongly suspect that the OP wouldn't have needed to ask the original question if they had to worry about all of that.  😀

    However, you did remind me of a couple of things that I apparently lost my mind on for this problem...

    1. The op asked a simple question and we provided a couple of simple examples.  The question might even be an interview question or a question on homework or test.  Yes, the problem can be done  using only math and, if looking just at the face value of the problem, the answers would be ok but there's an opportunity to teach here (which we would certainly be condemned for on StackOverflow) that I certainly and mistakenly overlooked.
    2. What does need to be taught (and you're trying to teach it) is that the formula for classifying the ZIP code provides a value, say, a "slab" as the op indirectly referred to it as.  The next question would be, why is creating "slab" (ok... call it a "category" of some sort) important and why does that value need to be presented?  It's like having a "status" column that contains an "A", "C", "I", "P", or "R".  Even if there were a "magic formula" that calculated those status values and even if no user in the world would ever see them or need to know what they are, there are two things missing... instant obvious knowledge of what they are just by looking at the them (especially for troubleshooting) and a guarantee as to what values can be in the narrow usage column.  Instead, you'd have to find them in the code an science it out to know for sure... not good, especially in undocumented code that won't say anymore that what the replacements for the case statement in this problem have done.  The other thing is that there's nothing to prevent bogus/incorrect values from appearing even if the formula actually is the way to go.

      And so you've reminded me of something I strongly enforce at work... If you can't tell what it means just by looking at it, you need a reference table with a description to explain it (we even have a column and table naming convention for all that) AND that reference table MUST be cited in a Foreign Key on that column.  Even if all esoteric knowledge in the company gets wiped out by the same bus on the same day, someone will be able to easily figure out what it's there for, etc, and they won't find out the hard way.

    So, yes... I first owe you an apology... and I totally agree... a lookup table is absolutely needed here.  The reference table is needed even if you don't use it to do a range lookup (although that would also add some flexibility without having to change code in the future) while providing DRI.

    And second, thanks for being tenacious and sticking to your guns on something that you're right about.  Good form and well done, Mr. Williams!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This has been an interesting discussion.

    I, too, despise magic numbers, especially when developers assume because they know it now, everyone in the future will remember.

    I prefer a small lookup table, even one that has a column for the value to use in the CASE and a column for a description. That solves a lot of problems later, as Mr. Williams and Mr. Moden noted.

  • But any lookup table does NOT need to contain a list of all zip codes as originally stated.  If the "slab" is what is significant, the lookup only needs to be on slab value, not on the entire zip.  And, in that case, the easiest method of deriving the "slab" is still preferable.

    It's hard to imagine there's any real meaning from only the first digit of a zip.  If it's an internal grouping for some reason, then name it to document what it is.  For example, if it controls some processing at that company, then name it "processing_group" or something similar.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    But any lookup table does NOT need to contain a list of all zip codes as originally stated.  If the "slab" is what is significant, the lookup only needs to be on slab value, not on the entire zip.  And, in that case, the easiest method of deriving the "slab" is still preferable.

    It's hard to imagine there's any real meaning from only the first digit of a zip.  If it's an internal grouping for some reason, then name it to document what it is.  For example, if it controls some processing at that company, then name it "processing_group" or something similar.

    Where was it stated that a lookup table should have all zip codes?  The lookup I recommended was a category range lookup table - not every zip code.  If you have to account for different country postal codes - you can add those ranges with a key lookup column for the country, or separate tables and code that branches...but we are not talking about a lookup of all zip codes.

    As you pointed out - there are other ways to create the ranges, such as your suggestion for using shipping zones (first 3 digits of the zip).

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Again, I'm still sticking with code, such as a function, for this rather than a physical table until I see an actual need for a table.  The first digit alone of a U.S. zip code is meaningless for any real-world business function.  That alone is not worth the overhead of a table here -- yet, without further input as to why a table would actually be needed.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I'll still disagree. This might be as simple as a table for sales

    ZipCodeLogic
    Salesperson ZipCodeStart Reason
    ----------- ------------ ---------
    1 0 upper NE territory
    2 1 NY area
    15 2 Mid Atlantic sales
    etc.

    Then I can use ZipCodeStart in some programming logic.

  • And I still say you need business input before deciding on what to put in any table you create.  I don't think you can properly model the data until then, and without modeling it first, you shouldn't be creating a table.  Now, I know developers ignore that rule all the time, but it doesn't mean they should ignore it, just that they're lazy enough to do it.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    And I still say you need business input before deciding on what to put in any table you create.  I don't think you can properly model the data until then, and without modeling it first, you shouldn't be creating a table.  Now, I know developers ignore that rule all the time, but it doesn't mean they should ignore it, just that they're lazy enough to do it.

    Ok.. so I have to disagree with you on this one, Scoot.  In that case (a case that I'm actually reminded that I did just last week with a nearly identical but slightly more complex issue), I'd invoke (like I did) my "Management Approved" power as a DBA to say that the code doesn't go unless such a reference table exists.  I have a ton of such "business inputs" that fly in the face of anything and everything including "modeling it first" because previous folks took on the attitude that "business input" is always right.  It doesn't take a genius to understand that wasn't done in the case of a hard-coded solution that had no DRI.

    The people that I accosted with that suggestion last week (on an IVR system) were actually thankful especially since I told them what their "business input" needed to me and designed it on the spot and submitted it for approval.  I DO, however, agree that a DBA is also under the same rules as any Developer.  Though shall NOT deploy untested, unapproved objects or code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Who t.h. is "Scoot", Jeft?

    Hmm, so you allow ad-hoc tables with NO design process or thought, i.e. no data modeling, to be created and deployed?  Interesting.

    At any rate, how do you know such a reference table doesn't already exist?  If the info was useful for the business, couldn't have it been useful before?  We review these types of requests and 1/3 to 60% of the time such a table already exists.

    If you have multiple dbs, in which db should the table reside?  If it's a general reference table -- as this one would certainly seem to be --  that can be a key q.  Of course, once you make it a FK, it must reside in the same db.  Will that allow other dbs to use it properly?  Does the company have a standard "utility" / reference db that the should go in instead?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply