Unicode and Chinese Characters

  • Sorry if this turns out to be a double post - I thought I had posted it but afterwards I can't find it (and lost the original post text) ....

    We are investigating the issues involving receiving Unicode data from our Chinese subsidiary into our data warehouse. The subsidiary will be using an Oracle database, in Unicode, with the Simplified Chinese character sets. We will set up an extraction process to import their sales data into our SQL Server data warehouse via ASCII text files. Since our data warehouse is not currently in Unicode, we have some issues here. As I understand Unicode, this is two-byte per character, which would mean doubling the amount of storage and memory requirements for our SQL Server database if we convert it to Unicode.

    The Chinese data will be mostly normal western/english alphanumeric characters, but all the descriptions will be in local Simplified Chinese characters (products, customers, salesmen, etc). The issue is how to deal with these local descriptions. As the primary use of the data warehouse is for group level (headquarters) not local level, so there will be limited (if any) usage of the data warehouse data by the Chinese staff. Those who may use it will be management and likely to speak english.

    Our options seems to be:

    a) convert out data warehouse to Unicode

    b) set up a parallel data warehouse in Unicode to handle the Chinese data (and report it separately from the rest)

    c) do nothing and allow the Chinese data to import as garbage ASCII

     

    Some questions:

    1) If we convert the SQL Server data warehouse to Unicode, does the Windows Operating system also need to be converted to Unicode?

    2) Are there any issues with accepting the "garbage ASCII" characters?

    3) Any other issues that I may not be aware of?

     

    THANKS

     

     

  • This was removed by the editor as SPAM

  • I've never done Chinese Characters but I've dealt w/ unicode before and based on that, let me try to answer as much as I can. However, all I am saying is just a suggestion not prove to be right so please test first to imprement anything.

    If you have SQL2K, you don't have to convert all database to unicode but you can have column that supports unicode like ntext, nvarchar and that should be enough to store any unicode charactor and I'd recommend to change the collation if you want to dedicate that column to be chinese such as "Dictionary order, case-insensitive, for use with the 950 (Traditional Chinese) character set."  And about c) do nothing and allow the ..., if you don't want to do anything w/ that data, why you even want to import and wasting space unless you need to do somehting else w/ that.

    And from windows operating system wise, I'd recommend to add the language from Regional Setting if you are using windows 2000 or up so that you can at least see the charactor from Query analyzer and do some bcp things easily. Unless you do that, time to time, importing and exporting might break the charactor set because while app just use ASCII code to import and convert back to unicode unless you say so otherwise...

    Hope this is a little bit helpful.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply