Converting Chinese or Japanese double-byte characters to unicode

  • I'm having trouble converting Japanese and Chinese double-byte characters to unicode.

    My Chinese characters are in this format:

    GB2312

    http://www.iana.org/assignments/charset-reg/GBK

    Microsoft Code Page 936

    I've had success manually converting them using these two methods:

    Unifier from http://www.melody-soft.com

    http://www.geocities.com/herong_yang/gb2312/ java converter

    My converted characters look great using these tools and I can successfully get it into a SQL Server Unicode Latin1_General collation.

    This page from Microsoft says SQL 2000 does support code page 936:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_ca-co_2e95.asp

    How do I use SQL's Chinese_PRC_CI_AI colation to convert gb2312 to unicode though?  I'd like to do my conversion using the built-in sql collations rather than the external tools and lists that worked.

    For japan, I'm using Shift-JIS and got it working with these resources:

    http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT

    Unifier from http://www.melody-soft.com

    Can someone send me a code sample that accepts 2 bytes of varchar in either GB2312 or Shift-JIS format then uses SQL's collation to converts it to a 1 (double-byte) character Unicode nvarchar?

    Thanks,

    Ray Metz

     

     

     

  • My associate at work showed me how to do it...

    SET NOCOUNT ON

    -- This is your table with GB2312 coded cities

    CREATE TABLE #GB2312

      (CITY VARCHAR(30) COLLATE Latin1_General_CI_AI)

    INSERT INTO #GB2312 VALUES('»´±±')

    INSERT INTO #GB2312 VALUES('»´ÄÏ')

    INSERT INTO #GB2312 VALUES('»ÆɽÊÐ')

    -- Start conversion

    -- Create a temporary table to hold your non-unicode data

    CREATE TABLE #NonUnicode

      (CITY VARCHAR(30) COLLATE Chinese_PRC_CI_AS)

    -- Populate your non-unicode temp table with varbianary data selected from your GB2312 coded table

    INSERT INTO #NonUnicode(CITY)

    SELECT CONVERT(VARBINARY(8000),CONVERT(VARCHAR(30),CITY COLLATE Latin1_General_CI_AI))

    FROM #GB2312

    -- Create a table to store your converted data

    CREATE TABLE #Unicode

      (City nvarchar(15) COLLATE Latin1_General_CI_AI)

    -- Populate your unicode table

    INSERT INTO #Unicode

    SELECT CITY COLLATE Chinese_PRC_CI_AS

    FROM #NonUnicode

    -- View your unicode output

    SELECT CITY

    FROM #Unicode

    DROP TABLE #NonUnicode

    DROP TABLE #Unicode

    DROP TABLE #GB2312

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply