Extended Latin Character - Need help on storing in my DB

  • Hi,

    I need to save the below characters in my DB . My Insert stmt is not working peoperly .

    I used the datatype nvarchar(max) but it did not work out .

    Pls note this is an Extended Latin type and not an not a normal latin characters. Normal latin gets stored without issues .

    For the below if i use nvarchar(max) the notes or accent is missing . e.g Ā ā is getting stored as Aa.

    Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď

    Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ

    Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į

    İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ

    ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ

    Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş

    Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů

    Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ

    Wikipedia name :

    Latin Extended-A

    0100–017F

    Thanks in advance .

    Regards

    Raghav

  • You need to make sure that your literal string is Unicode. Compare the following results where the first literal is ASCII and the second is Unicode.

    SELECT 'A a A a A a C c C c C c C c D d', N'A a A a A a C c C c C c C c D d'

    You're probably using literal ASCII strings.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks Allen for your reply . I understood now .

  • Non-Unicode Text Types: char, varchar, text, varchar(max)

    When you deal with text data that is stored in the char, varchar, varchar(max), or text data type, the most important limitation to consider is that only information from a single code page can be validated by the system. (You can store data from multiple code pages, but this is not recommended.) The exact code page used to validate and store the data depends on the collation of the column. If a column-level collation has not been defined, the collation of the database is used. To determine the code page that is used for a given column, you can use the COLLATIONPROPERTY function, as shown in the following code examples:

    SELECT COLLATIONPROPERTY('Chinese_PRC_Stroke_CI_AI_KS_WS', 'CodePage')

    936

    SELECT COLLATIONPROPERTY('Latin1_General_CI_AI', 'CodePage')

    1252

    SELECT COLLATIONPROPERTY('Hindi_CI_AI_WS', 'CodePage')

    0

    The last example returns 0 (Unicode) as the code page for Hindi. This example illustrates the fact that many locales, such as Georgian and Hindi, do not have code pages, as they are Unicode-only collations. Those collations are not appropriate for columns that use the char, varchar, or text data type, and some collations have been deprecated. For a list of available collations and which collations are Unicode-only, see Collation Settings in Setup in SQL Server 2005 Books Online.

    Important In SQL Server 2005, use the varchar(max) data type instead of the text data type. The text data type will be removed in a future version of Microsoft SQL Server. For more information, see Using Large-Value Data Types in SQL Server 2005 Books Online.

    When Unicode data must be inserted into non-Unicode columns, the columns are internally converted from Unicode by using the WideCharToMultiByte API and the code page associated with the collation. If a character cannot be represented on the given code page, the character is replaced by a question mark (?). Therefore, the appearance of random question marks within your data is a good indication that your data has been corrupted due to unspecified conversion. It also is a good indication that your application could benefit from conversion to a Unicode data type.

    If you use a string literal of a non-Unicode type that is not supported by the collation, the string is converted first using the database's default code page, which is derived from the default collation of the database.

    Another problem you might encounter is the inability to store data when not all of the characters you wish to support are contained in the code page. In many cases, Windows considers a particular code page to be a "best fit" code page, which means there is no guarantee that you can rely on the code page to handle all text; it is merely the best one available. An example of this is the Arabic script: it supports a wide array of languages, including Baluchi, Berber, Farsi, Kashmiri, Kazakh, Kirghiz, Pashto, Sindhi, Uighur, Urdu, and more. All of these languages have additional characters beyond those in the Arabic language as defined in Windows code page 1256. If you attempt to store these extra characters in a non-Unicode column that has the Arabic collation, the characters are converted into question marks.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply