Do You Still Use CHAR?

  • Jeff Moden wrote:

    Caruncles wrote:

    CHAR has bitten me numerous times...

    How so?  Especially when it comes to single byte stuff that even you know will never change?

    I'd' say it's time to stop going out with Char!

    Sorry, I had a high school friend called Char, short for Charlene.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • skeleton567 wrote:

    Just got a good laugh at the end of this conversation.  Saw the following line.  Is this a limitation of a CHAR data element?

    Viewing 11 posts - 1 through 10 (of 10 total)

    Now THAT's funny!  And, when I clicked "LIKE" on your post, it seemed like it didn't register because it didn't change to "UNLIKE" like it used to.  Now, you have to press the f5 key to get it change.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • skeleton567 wrote:

    Jeff Moden wrote:

    Caruncles wrote:

    CHAR has bitten me numerous times...

    How so?  Especially when it comes to single byte stuff that even you know will never change?

    I'd' say it's time to stop going out with Char!

    Sorry, I had a high school friend called Char, short for Charlene.

     

    BWAAA_HAAA_HAAA!!!! DAMMIT!  I almost inhaled my nicotine lozenge on that one.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eric M Russell wrote:

    skeleton567 wrote:

    Thought crosses my mind that maybe the use of CHAR data is a good way to have to revisit the Y2K days.

    As of 2014, FaceBook recognizes 58 variations of gender. But we still have plenty of time to retire before it expands beyond the capacity of CHAR(1) or TINYINT.

    https://abcnews.go.com/blogs/headlines/2014/02/heres-a-list-of-58-gender-options-for-facebook-users%5B/quote%5D

    Eric, I thought I had a pretty good way to simplify that back down to three codes,  but then I realized my solution using M-F-F actually had a duplication...

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • 'How so?  Especially when it comes to single byte stuff that even you know will never change?"

    As soon as you know it will never change - it DOES.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • I tend to overestimate when designing table schema and using varchar or nvarchar if it's not max. So I agree Steve, that if I have an invoice column and the design specifications might say it's 10 max to overestimate a little for future changes and make it 15 to 20.

  • Have any of you realized that the four-digit year is only good for another 7977 years?  Better get ready.

    • This reply was modified 2 years, 10 months ago by  skeleton567.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • skeleton567 wrote:

    Have any of you realized that the four-digit year is only good for another 7978 years?  Better get ready.

     

    Is that going to be their Y2K dilemna?

  • There are probably a lot of databases out there with CHAR(1) Gender columns with a check constraint ensuring that only M, F, and (maybe) "X" can be stored. Most of those will need to be updated at some point. But that doesn't make the initial creation of the column as a CHAR(1) wrong.

    I use CHAR(1) for alphanumeric codes that I know will only ever contain one character, or CHAR(2) for such columns where all valid values are exactly two characters. And they often have check constraints, unless they have an FK to a lookup table of valid values.

  • skeleton567 wrote:

    Have any of you realized that the four-digit year is only good for another 7977 years?  Better get ready.

    I hope you and I are around to argue about that important cutoff date.  😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I didn't have any issues with CHAR columns lately, but we have had to adjust the size on our claim number(was char(8)) and policy number(was char(9)) sizes in the last few years to handle outside companies values.

    But this week I had an issue come up.  We have a claimant number field, this is a number assign within a claim for the different claimants on the claims.  Most of the time you may have a 1, or 2 on a claim, you could see up to 10, or more.  So the claimant number field was set up as a smallint.  plenty of room, right?

    That was all fine and dandy until we starting brining in outside claim info from other companies.  Now we had given them the specs of  what the field should be.  And for several years we haven't had any issues.  This week I found out they had been putting in a 'record id' key into this field. It finally maxed out what a smallint could hold.  So I had to make some quick changes to the database tables to make them int.  And had to go through and 'fix' the SSIS meta data in various packages to handle the larger value.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 wrote:

    I didn't have any issues with CHAR columns lately, but we have had to adjust the size on our claim number(was char(8)) and policy number(was char(9)) sizes in the last few years to handle outside companies values.

    But this week I had an issue come up.  We have a claimant number field, this is a number assign within a claim for the different claimants on the claims.  Most of the time you may have a 1, or 2 on a claim, you could see up to 10, or more.  So the claimant number field was set up as a smallint.  plenty of room, right?

    That was all fine and dandy until we starting brining in outside claim info from other companies.  Now we had given them the specs of  what the field should be.  And for several years we haven't had any issues.  This week I found out they had been putting in a 'record id' key into this field. It finally maxed out what a smallint could hold.  So I had to make some quick changes to the database tables to make them int.  And had to go through and 'fix' the SSIS meta data in various packages to handle the larger value.

    None of these are actually issues with the char data type - in my opinion.  Would you setup an IDENTITY column and define it as bigint - because at some point in the future you might exceed the limitations of an integer column?  But - you don't know if that will happen, only that it could happen - maybe, in 10 or more years because of an acquisition?

    Why would you want to use varchar(10) in place of a char(10) - when the value in that column will always have 10 characters?  For example, NPI numbers are 10 characters and will never be less than 10 characters and it is extremely unlikely that this value will ever be increased to 11 or more characters.  So - to be 'safe' you set it to varchar(15) just in case...and then program around the invalid entries that have more than 10 characters, because some user will prefix the data entered - another user with postfix the data - and yet another will add dashes or asterisks or...

    Not to mention issues with updating varchar data - that can and will cause fragmentation.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Johan Bijnens wrote:

    Spot on, Jeff !

    The main point for people to prefer varchar over char is that their GUI development tool doesn't have to bother with trailing spaces.

    Lordy.  I understand that concatenation of CHAR(N) can lead to issues but I wasn't aware of any other issues that a "GUI Development Tool" might have with trailing spaces.  Since I haven't even touched a GUI Dev Tool in about 2 decades, can you tell me what type of issues there are with trailing spaces there?  For example, are trailing spaces included in sting comparisons and you can't set a default to ignore trailing strings?  That would be a bugger.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A popular college MIS system uses CHAR for many fields and this often leads to issues, not in the software, as that is built to account for this. However data is often extracted and may be dumped into Excel or another system and then formulas don't work due to the data being padded with spaces if the report writers forget to RTRIM the data.

    It is used on departmental codes and reference numbers but each provider uses different lengths so it makes no sense for it to be CHAR and it just seems to generate additional work, both in the software application and when exporting data to other systems so I can't really see the advantage unless a field will always be a certain length such as a Y or N field but then BIT would be better in that instance and for most else there are exceptions unless perhaps it is a national insurance number or something.

  • Ok, here's my take from my years of design and working with data systems.  Many of you have talked about working with data of a 'fixed' length that suddenly changes and can cause difficult issues to handle in a timely fashion.

    Constraints such as a standard, minimum, or maximum length or content DO NOT BELONG IN YOUR DATABASE DESIGN.

    PERIOD.

    Your database needs to be designed to handle future as well as current requirements.  OK, so you ask exactly what does that mean.

    Your data types and structures have to be designed to make the best LONG TERM INTERNAL use of them.  The ability to store, access and USE your data in the best and most efficient manner has to take priority.

    So how do you accomplish this.  As I see it, this all is based on the PROCESSES you use to ACQUIRE, CLEAN,  VALIDATE, and FORMAT the data before it ever is placed in the database itself.  From a development standpoint, it is far easier to alter your validation and type conversion code than it is to alter a final structure which may affect any number of applications.   You must isolate the external data requirements from your internal storage decisions.

    SQL Server offers an extensive array of built-in functions for doing this.  I have recently created a process of seven pages of SQL code (it is the only skill this old guy has left) to process data that is extracted from a current commercial product and then run through another commercial data modification product before it is ready to be stored in SQL Server.   There are a number of design decisions in these products that essentially make the raw extracted data worthless without some major attention.  For instance historical data contains two-digit character-format years in dates with actual modification of the data CONTENT to handle the century, and  fractional data in character format even with slashes, all with embedded spaces.  And for accuracy, this fractional data needs to be converted and rounded to a maximum of six digits to the right of the decimal

    My solution, due to my non-existent front-end programming skills is to use SSIS to import this data in its raw format and create SQL code to make the massive alterations needed before final storage.  In fact, the data from the first application is imported, cleaned, validated, and then exported back to text to be passed through the second application for logical structural changes, which then prepares it for final import and storage in the data structures.

    I just took time to go through the SQL code and count the calls I make to the built-in functions that for the most part are going to be the most efficient way to handle things.  In seven pages of SQL, I make 147 calls to built-in functions nested to a depth of up to four.

    Why do this?  It gives me the ability to make the internal storage of data pretty much independent of the external size/type/format requirements of the outside world.  Need to increase the size of a character data item from 11 to 14 characters?  Fine.  Just modify your data cleansing and validation code to allow the additional digits and you're done.  By using variable-length and larger maximum data item sizes, I can greatly reduce the impact of changing requirements on my structures.

    Your front-end developers will most likely be pretty unhappy with you, because this requires that their code in turn be able to do adequate validation and to handle rejection of data before final storage.  But by doing this you can save yourself massive headaches and your company massive development costs to make minor alterations.

    I refer you to the expansion of the historic gender designations of two single characters to the nearly sixty 'socially acceptable' designations that will probably need an identity and foreign key relationship.  Who is going to care if your data validation has to suddenly handle the 60 gender identity values instead of simply 1, 2, and possibly 3.  I'm actually old enough to remember when we only had to handle 1 and 2, but it was a rare occasion when my DB design couldn't handle the changes.

    Remember, validate, translate, and convert the data BEFORE it hits your structures.

    • This reply was modified 2 years, 10 months ago by  skeleton567. Reason: typo

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply