Need Example of Unicode and Non-unicode data

  • Hello I Need Example of Unicode and Non-unicode data. when i have to create a table and select data types I get confused on wheather the type should be char or nchar ( so basically not sure what are unicode characters and what are non - unicode ). I know that unicode data type will save a lot of space but can someone give examples when to have a char and when to have a nchar or when to have a nvarchar and when to have a varchar.

    Thanks a lot.

  • In my understanding Unicode characters are extended characters. From BOL:

    The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.

    One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications, or code pages, for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

    Each Microsoft SQL Server 2005 collation has a code page that defines what patterns of bits represent each character in char, varchar, and text values. Individual columns and character constants can be assigned a different code page. Client computers use the code page associated with the operating system locale to interpret character bit patterns. There are many different code pages and some characters appear on some code pages, but not on others. Some characters are defined with one bit pattern on some code pages, and with a different bit pattern on other code pages. When you build international systems that must handle different languages, it becomes difficult to pick code pages for all the computers that meet the language requirements of multiple countries and regions. It is also difficult to ensure that every computer performs the correct translations when interfacing with a system using a different code page.

    The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system.

    So if you want to handle special characters like "è" or script languages lie Japanese or Chinese, you need unicode. iIf you are writing an app that only uses English than char and varchar types should be fine.

  • so for what you have explained Unicode can handle more then what non unicode handles. but then why would anyobe use a non unicode as non unicode takes double the size then unicode.

  • navpreet.khurmi (3/24/2008)


    so for what you have explained Unicode can handle more then what non unicode handles. but then why would anyobe use a non unicode as non unicode takes double the size then unicode.

    You want to reverse that. nvarchar (which takes unicode data) takes twice as much space to store data as varchar (which doesn't allow unicode data). In that sense - unicode fields stores HALF the data that non-unicode does within the same space.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks Matt. now I understood it.

  • Maybe you can refer here as well: nchar and nvarchar (Transact-SQL)

  • Matt Miller (#4) (3/24/2008)


    navpreet.khurmi (3/24/2008)


    so for what you have explained Unicode can handle more then what non unicode handles. but then why would anyobe use a non unicode as non unicode takes double the size then unicode.

    You want to reverse that. nvarchar (which takes unicode data) takes twice as much space to store data as varchar (which doesn't allow unicode data). In that sense - unicode fields stores HALF the data that non-unicode does within the same space.

    So if I'm designing a table thats going to get very large (supporting english only) then one design consideration should be stay with non-unicode data types?

  • That is correct. Each Unicode character takes 2 bytes of storage (2 8-bit words) and each non-Unicode character takes 1 byte (1 8-bit word) of storage.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply