June 15, 2009 at 5:54 pm
Maybe it's because I'm European, but I'm afraid I don't understand the (many) respondents who claim that if the character type is unknown, you should choose the most limited option? How is that ever going to be the safest option? One foreign character (and even English language apps can have a need for foreign names, for instance) and your application breaks! I don't know any employer who'd prefer that over defensive coding?!? If you are told the character set is restricted to ascii then fine, but the question offered no such restriction.
Puzzled and somewhat concerned,
If you later find out it should be unicode, you can change it.
Don't be too concerned. Foreign names can still be spelled using just the Global English (U.S.) language. Even Prince.:-D
June 15, 2009 at 8:08 pm
Hugo Kornelis (6/12/2009)
Tom Garth (6/12/2009)πHugo - How do you store a NULL?π
In the storage area of every row, one or more bytes are reserved for the so-called "null bitmap". [...]
*) for variable length columns: the two bytes representing length are set to a length of zero, so that no other bytes are used. (So a varchar zero-length string ('') ans a varchar NULL are stored exactly the same, except for the bit in the null bitmap.)
Glad sarcasm did not stop you Hugo - I have learned a lot from this.
Two things:
- from your explanation, the min. storage space of a varchar is 2 bytes (not 3 as some seem to believe)
- I still think the correct answer should be char(1) for all the good reasons first stated by Irish Flyer - don't Unicode if you are not asked to, storage, performance and backup times are still important considerations.
Then another thing:
nobody is biting on my question (see topic varchar vs char) as to why varchar(35) seems to occupy more dataspace than char(35) when the actual data stored in both cases is only 6 bytes ...
please try to find an answer??
June 16, 2009 at 1:03 am
Ol'SureHand (6/15/2009)
Two things:- from your explanation, the min. storage space of a varchar is 2 bytes (not 3 as some seem to believe)
Minimum storage for any varchar, nvarchar or varbinary column is 2 bytes, indeed. For this QotD, where the question specified the string will always be 1 character long, actual storage size will be 3 bytes for varchar or 4 bytes for nvarchar.
- I still think the correct answer should be char(1) for all the good reasons first stated by Irish Flyer - don't Unicode if you are not asked too, storage, performance and backup times are still important considerations.
I still say that the correct answer is "char(1) or nchar(1)". You can't decide without first asking more details.
In the (unlikely) case that I ever have to implement a string column without any way to find out what range of characters can be stored, than I'd prefer to use Unicode. In a previous reaction, someone suggested that it can always be changed later. True, but that would probably be after a bug report. That's almost like saying testing is not required, because bugs can be fixed later. And also, changing a column from non-Unicode to Unicode on a terabyte-sized table is far from a simple operation!
In this specific case, however, I might be more inclined to go for non-Unicode. Because typically, string columns of lengthh 1 are used for codes, and at least in western areas of the world, codes would be taken from the "main" character set.
Then another thing:
nobody is biting on my question (see topic varchar vs char) as to why varchar(35) seems to occupy more dataspace than char(35) when the actual data stored in both cases is only 6 bytes ...
please try to find an answer??
If you post a link to the topic, I'll take a look.
June 16, 2009 at 2:47 am
If you later find out it should be unicode, you can change it.
I assume you've never been through the process of converting a large database from varchar/char/text columns to nvarchar/nchar/ntext? I have, and it took a very long time!
June 16, 2009 at 7:51 am
Hugo Kornelis (6/16/2009)
Ol'SureHand (6/15/2009)
.. why varchar(35) seems to occupy more dataspace than char(35) when the actual data stored in both cases is only 6 bytes ...
If you post a link to the topic, I'll take a look.
THANKS!
It's at
http://www.sqlservercentral.com/Forums/FindPost731965.aspx'
(or http://www.sqlservercentral.com/Forums/Topic731965-360-1.aspx)
Contains all the specifics of the situation and the experiment. Weeks later, the dataspace difference persists as I repopulate both tables every night!
Looking forward to the reasons ! and thanks a million for all the valuable advice & info in all topics!
June 16, 2009 at 8:04 am
Ol'SureHand (6/16/2009)
Hugo Kornelis (6/16/2009)
Ol'SureHand (6/15/2009)
.. why varchar(35) seems to occupy more dataspace than char(35) when the actual data stored in both cases is only 6 bytes ...
If you post a link to the topic, I'll take a look.
THANKS!
http://www.sqlservercentral.com/Forums/Topic731965-360-1.aspx
(not sure how to get a proper link.... hope this works!)
Contains all the specifics of the situation and the experiment. Weeks later, the dataspace difference persists as I repopulate both tables every night!
Looking forward to the reasons ! and thanks a million for all the valuable advice & info in all topics!
To get a proper hyperlink, you click the "insert hyperlink" button above the post area (lust to the left of the B (bold) button, then type (or paste) the URL. That will insert a start and stop tag; in between them you can type the link text (as in: click here to get there) or you repeat the URL to get the URL itself in hyperlinked form in the message.
However, copy and paste into a new browser winjdow worked for me as well, so I'm now looking at your problem.
June 16, 2009 at 10:05 am
Toreador (6/16/2009)
If you later find out it should be unicode, you can change it.
I assume you've never been through the process of converting a large database from varchar/char/text columns to nvarchar/nchar/ntext? I have, and it took a very long time!
I wasn't imagining only finding out it should be unicode after the database is in production. That would seem to be an unusual situation, other than a spec change or upgrade which could happen with most any app.
June 16, 2009 at 1:34 pm
The answer is nonsense! The character might well be a Chinese one and that will not fit into char(1) or varchar(1). I work daily with text data that includes 5 languages, two of which don't use anything expressible in Western European encoding, so char and varchar are useless to me. Remember that there are more SQL users outside the ASCII world than inside it. And give the 23% of people who to date have given an answer (nchat(1), nvarchar(1)) that will work in practise their points - they deserve them more than the people who have given the completely unworkable answer that is claimed to be correct!
Myself I picjked nchar(1), not nvarchar(1), because I couldn't see the point in wasting space. But at least nvarchar(1) will work, while neither char(1) nor varchar(1) will work for the question as stated (a character - not specified to be from a character set with 256 or fewer characters using a single byte encoding, so an answer that assumes it is so restricted is just plain wrong).
Quite a few of teh questions I've seen here have wrong answers. Maybe some vetting would be a good idea - but I guess vetting costs time and effort and if teh bad questins are not too frequent it's maybe simpler just to let them happen (and maybe correct them if enough people shout).
I was amazed by one of the comments above : that nchar(1) must be wrong because in real life everything is ascii. Try telling hat to an Arab, a Russian, a Chinaman, an Indian or a Japanese! Or even to an Irishman (no strac geur in ascii) or a Welshman, or to a Scot like me \(many of us can't even spell our name correctly using ascii).
Tom
June 16, 2009 at 1:57 pm
it's not a spec-- it's not specified.. I don't know if the creator of this site is in Chicago or Beijing!
Without that critical piece of information 'who is asking me to do this' we NCHAR people should get points also!
I _INSIST_!
June 17, 2009 at 4:40 am
Christian Buettner (6/5/2009)
I had chosen char(1) instead of varchar(1) because thats what it actually is, a fixed length string.And I had chosen char(1) instead of nchar(1) because I think it is very unlikely that someone would want to store a sincle unicode character in a variable. In the end it will most likely be some code like "V" for view or "T" for table. But I agree that this is subject to the requirements.
I think that sort of assumption is careless and sloppy You might get away with it if you had a field name that supported teh assumption, but even then I would be very unhappy with any developer who made teh assumption when the specification didn't support it.
You're asked to sugest a type for a one character field. One very common such field, in lots and lots of databases, is [Middle Initial]. Oddly enough, Russians tend not to have middle initials in ascii. Ditto Arabs. And so on. Of course if you want to either (a) do business only in the US or (b) piss off your customers in the rest of the world by failing to do them the elementary courtesy of printing their name and address on envelopes in the appropriate language, char(1) will work just fine.
Tom
June 17, 2009 at 5:01 am
The angry Scotsman is right - it is completely wrong to make assumptions about data.
However, unlike most of the people posting on this thread who seem to cling to the spec like a godlike edict from above - my first course of action would be to contact the person that wrote the spec and ask them whether they think they might require nchar. It never hurts to ask.
June 17, 2009 at 5:27 am
Tom.Thomson (6/17/2009)
Christian Buettner (6/5/2009)
I had chosen char(1) instead of varchar(1) because thats what it actually is, a fixed length string.And I had chosen char(1) instead of nchar(1) because I think it is very unlikely that someone would want to store a sincle unicode character in a variable. In the end it will most likely be some code like "V" for view or "T" for table. But I agree that this is subject to the requirements.
[quore]I think that sort of assumption is careless and sloppy
Why? I had written that this is "subject to the requirements". What more can I do?
The problem is not what I chose, the problem is what the specification of the analyst indicates.
It is not always an ideal solution to code against the worst scenario of "incomplete" requirements! Thats the wrong way. The right way is to complete the missing parts of the requirements.
I already wrote it before, too many people use unicode, because they are too lazy to identify the type of data that is to be stored in the fields.
If I get back to the analyst and he tells me that he needs to store initials like you mentioned - OK. But I would never make unicode the default unless I am specifically told to do so.
The problem with the QOTD is that there is no analyst we could get back to. So all we can do is describe our rationales for our answers (like I did and you did). There is no right or wrong with regards to unicode or non-unicode in this QOTD, because it depends on the requirements. As long as they remain unclear, the solution will remain unclear as well.
Best Regards,
Chris BΓΌttner
June 17, 2009 at 5:53 am
Christian Buettner (6/17/2009)
Tom.Thomson (6/17/2009)
Christian Buettner (6/5/2009)
I had chosen char(1) instead of varchar(1) because thats what it actually is, a fixed length string.And I had chosen char(1) instead of nchar(1) because I think it is very unlikely that someone would want to store a sincle unicode character in a variable. In the end it will most likely be some code like "V" for view or "T" for table. But I agree that this is subject to the requirements.
[quore]I think that sort of assumption is careless and sloppy
Why? I had written that this is "subject to the requirements". What more can I do?
The problem is not what I chose, the problem is what the specification of the analyst indicates.
It is not always an ideal solution to code against the worst scenario of "incomplete" requirements! Thats the wrong way. The right way is to complete the missing parts of the requirements.
I already wrote it before, too many people use unicode, because they are too lazy to identify the type of data that is to be stored in the fields.
If I get back to the analyst and he tells me that he needs to store initials like you mentioned - OK. But I would never make unicode the default unless I am specifically told to do so.
The problem with the QOTD is that there is no analyst we could get back to. So all we can do is describe our rationales for our answers (like I did and you did). There is no right or wrong with regards to unicode or non-unicode in this QOTD, because it depends on the requirements. As long as they remain unclear, the solution will remain unclear as well.
I agree with you Christian π
The angry Scotsman is right - it is completely wrong to make assumptions about data.
But those advocating nchar are also making an assumption :blink:
Anyone using nchar on an assumption deserves Pork Chops thrown at them in my view :w00t:
Far away is close at hand in the images of elsewhere.
Anon.
June 17, 2009 at 6:03 am
David Burrows (6/17/2009)
But those advocating nchar are also making an assumption :blink:Anyone using nchar on an assumption deserves Pork Chops thrown at them in my view :w00t:
Hi David,
If forced to make an assumption (which, outside of this QotD setting, should never be the case), I would prefer the option that might be slower and might waste space but that is definitely correct over the option that's faster and leaner but might cause bugs.
If that nets me some tossed meat, then so be it. :kiss:
June 17, 2009 at 6:13 am
Tom.Thomson (6/17/2009)
I think that sort of assumption is careless and sloppy
...and I think that sort of statement is arrogant and presumptious π
Tom.Thomson (6/17/2009)
You might get away with it if you had a field name that supported teh assumption, but even then I would be very unhappy with any developer who made teh assumption when the specification didn't support it.
There is no specification to support this QOD. That's part of the problem.
Teh word is 'the'.
Tom.Thomson (6/17/2009)
You're asked to sugest a type for a one character field. One very common such field, in lots and lots of databases, is [Middle Initial]. Oddly enough, Russians tend not to have middle initials in ascii. Ditto Arabs. And so on.
The overwhelming majority of databases use varchar and char rather than the unicode equivalents.
In the absence of further information, it is perfectly reasonable to go with the more common variant.
Tom.Thomson (6/17/2009)
Of course if you want to either (a) do business only in the US or (b) piss off your customers in the rest of the world by failing to do them the elementary courtesy of printing their name and address on envelopes in the appropriate language, char(1) will work just fine.
A superior attitude is often more annoying than failing to support unicode.
Paul
edit: for spelling
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 121 through 135 (of 182 total)
You must be logged in to reply to this topic. Login to reply