July 25, 2005 at 6:34 pm
I'd like to ask some of my colleagues what critical areas should be considered when the question of "Unicode/Globalization" comes up. For those of you who have successfully converted your systems to Unicode, I salute you!
I have a few questions for those of you who have been through this. Any help would, naturally, be appreciated since we DBAs are very busy.
Are there any "gotchas" when we go to convert our char, varchar, etc., columns to "nchar", "nvarchar", and so on. What are the general methods some of you have used to convert these column types? Via EM and change the column attribute?-- or -- will this come back and "bite me" in some way? (for example, when I change from char to nchar, will SQL Server alter the contents of the field in any way that I should be aware of?) Or, should a different method be used?
July 28, 2005 at 8:00 am
This was removed by the editor as SPAM
July 28, 2005 at 2:32 pm
There's no gotcha's in that way.
The only gotcha is if you have used the DATALENGTH function on any strings to test the lengh, this presents an integer in Bytes. Otherwise, you should be Okay, I havn't seen any wierdness.
July 28, 2005 at 4:57 pm
Thanks!. That helps clear up a few answers I had. Thanks for your post! I do appreciate you taking time from your schedule to jot down a couple of notes! It has been a fast week as I'm sure all will agree.
February 9, 2006 at 1:55 pm
(Post-scriptum caveat: I'm far from an expert. I'm just reporting some simple things of which I myself was not aware some years ago, in case they are of any help to others.)
Things I'd suggest.
* Be aware of the limitations of UCS-2LE
My guess is this only affects GB18030 Chinese.
* Realize that string length is not the same as byte length.
Look up "composing characters" in Unicode if you're not familiar with them.
* Realize that display text width is much more complex than character length (due to non-spacing characters).
* Realize that string equality requires considering how you want to handle "composing characters".
* Read up on the subject of UTF-8 encoding and minimal encoding and the security implications.
* Realize that there are a few corner cases preventing uppercasing and lowercasing from being idempotent (eg, Greek sigma, Turkish i, German SS).
* Review caret & cursor behavior for right-to-left languages, and for the case of mixing, say, Arabic and French (ie, mixing RTL and LTR text).
February 15, 2006 at 7:43 am
In .NET for string equality implement the ICOMPARER interface on the application layer unlike the ICOMPARABLE interface it let different types to be compared. UCS-2LE uses less space and UTF8 is almost ASCII under the cover. Try the thread below I explained Chinese specific Unicode in SQL Server on another forum. Hope this helps.
http://forums.asp.net/1067798/ShowPost.aspx
Kind regards,
Gift Peddie
Kind regards,
Gift Peddie
February 15, 2006 at 12:05 pm
But, on that thread you said "Dictionary order" for Chinese.
Chinese dictionaries use several different orders, in my experience:
First by radical, and then by stroke count
(I think this is the most common order?)
But I suppose this must be subdivided into two variants, based on
whether traditional or simplified stroke counts are used
By stroke count
Again, I guess there are two variants of this
Phonetic by pinyin Latin alphabetization
Phonetic by bopomofo order
When you say "dictionary order", do you mean "by radical then by stroke, using simplified stroke counts" ?
February 15, 2006 at 1:17 pm
Perry,
Those codes are straight out of SQL Server 2000 BOL and I was interacting with someone who is Chinese and it helped resolve his problems, I cannot go in detail with you now because in my current project my RDBMS(relational database management systems) is Oracle 10g 64bits. I am still in Design stage but it will be deployed in at least 33 countries.
Kind regards,
Gift Peddie
Kind regards,
Gift Peddie
February 15, 2006 at 5:43 pm
If that is a subtle way of saying that you don't know, that's fine -- I'm only an amateur myself, in either Chinese language or g10n, and just posting because noone else seems to (plus, I had a real question about corruption, and I like to try to answer some other peoples' questions when I ask one of my own, as a kind of feeling of fairness).
February 16, 2006 at 5:43 am
Perry,
No that is not my way of telling you I don't know because I know sorting require applying the equality operator to types and SQL Server 2000 BOL (books online) says the best is Binary but SQL Server must be case sensitive. I am not an academic I only know what works there are six different Chinese sort in SQL Server you are running SQL Server I am not so you run some test. Richard the person I was helping is of Chinese decent and he said it is based on pronunciation, so I would assume Dictionary is based on the 2000 plus Chinese alphabet. The point is he said thanks and did not come back which means his problem was solved.
Kind regards,
Gift Peddie
Kind regards,
Gift Peddie
February 17, 2006 at 9:25 am
Sorry, I get it now -- you were just copying&pasting text from Books Online, and don't necessarily understand what you pasted -- so I needn't be asking you to explain what you posted.
I don't see that text "Dictionary Order" you quoted in my Books Online, but I do see some collations labelled stroke order, so it might be a poor way (by Microsoft, not you) to say that somewhere. Stroke order is not phonetic, but you've already explained that you don't use SQL Server, so it won't matter to you, so this is a nice fun pointless explanation, isn't it?
In case anyone ever does read this -- although I hope not -- I'll summarize:
My Books Online says:
202 Chinese_Taiwan_Stroke_CS_AS
so apparently 202 is collated on stroke order.
February 17, 2006 at 9:42 am
I am both MCSE and MCDBA certified and I am SQL Server expert so your comment about me not using SQL Server is not relevant. Richard is of Chinese decent I will take what he tell me about Chinese in SQL Server over what you say. And for people who will read just this page Windows code page is different and I have posted the SQL Server info in the link I provided. Here is that info.
196 | Chinese_Taiwan_Stroke_BIN |
197 | Chinese_Taiwan_Stroke_CI_AS |
198 | Chinese_PRC_BIN |
199 | Chinese_PRC_CI_AS |
200 | Japanese_CS_AS |
201 | Korean_Wansung_CS_AS |
202 | Chinese_Taiwan_Stroke_CS_AS |
203 | Chinese_PRC_CS_AS |
196 | Binary order, for use with the 950 (Traditional Chinese) character set. |
197 | Dictionary order, case-insensitive, for use with the 950 (Traditional Chinese) character set. |
198 | Binary order, for use with the 936 (Simplified Chinese) character set. |
199 | Dictionary order, case-insensitive, for use with the 936 (Simplified Chinese) character set. |
200 | Dictionary order, case-sensitive, for use with the 932 (Japanese) character set. |
201 | Dictionary order, case-sensitive, for use with the 949 (Korean) character set. |
202 | Dictionary order, case-sensitive, for use with the 950 (Traditional Chinese) character set. |
203 | Dictionary order, case-sensitive, for use with the 936 (Simplified Chinese) character set. |
Kind regards,
Gift Peddie
Kind regards,
Gift Peddie
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply