Extended Latin Character - Need help on storing in my DB

Question

Extended Latin Character - Need help on storing in my DB

raghavendraselvan.sk

Old Hand

Points: 325
More actions
February 1, 2012 at 8:38 am

#263886

Hi,
I need to save the below characters in my DB . My Insert stmt is not working peoperly .
I used the datatype nvarchar(max) but it did not work out .
Pls note this is an Extended Latin type and not an not a normal latin characters. Normal latin gets stored without issues .
For the below if i use nvarchar(max) the notes or accent is missing . e.g Ā ā is getting stored as Aa.
Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď
Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į
İ ı Ĳ ĳ Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
ŀ Ł ł Ń ń Ņ ņ Ň ň ŉ Ŋ ŋ Ō ō Ŏ ŏ
Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů
Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
Wikipedia name :
Latin Extended-A
0100–017F
Thanks in advance .
Regards
Raghav

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 76998 More actions · Answer 1

You need to make sure that your literal string is Unicode. Compare the following results where the first literal is ASCII and the second is Unicode.

SELECT 'A a A a A a C c C c C c C c D d', N'A a A a A a C c C c C c C c D d'

You're probably using literal ASCII strings.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

raghavendraselvan.sk Old Hand Points: 325 More actions · Answer 2

raghavendraselvan.sk

Old Hand

Points: 325

February 1, 2012 at 9:18 am

#1441277

Thanks Allen for your reply . I understood now .

johnitech.itech Hall of Fame Points: 3252 More actions · Answer 3

Non-Unicode Text Types: char, varchar, text, varchar(max)

When you deal with text data that is stored in the char, varchar, varchar(max), or text data type, the most important limitation to consider is that only information from a single code page can be validated by the system. (You can store data from multiple code pages, but this is not recommended.) The exact code page used to validate and store the data depends on the collation of the column. If a column-level collation has not been defined, the collation of the database is used. To determine the code page that is used for a given column, you can use the COLLATIONPROPERTY function, as shown in the following code examples:

SELECT COLLATIONPROPERTY('Chinese_PRC_Stroke_CI_AI_KS_WS', 'CodePage')

936

SELECT COLLATIONPROPERTY('Latin1_General_CI_AI', 'CodePage')

1252

SELECT COLLATIONPROPERTY('Hindi_CI_AI_WS', 'CodePage')

0

The last example returns 0 (Unicode) as the code page for Hindi. This example illustrates the fact that many locales, such as Georgian and Hindi, do not have code pages, as they are Unicode-only collations. Those collations are not appropriate for columns that use the char, varchar, or text data type, and some collations have been deprecated. For a list of available collations and which collations are Unicode-only, see Collation Settings in Setup in SQL Server 2005 Books Online.

Important In SQL Server 2005, use the varchar(max) data type instead of the text data type. The text data type will be removed in a future version of Microsoft SQL Server. For more information, see Using Large-Value Data Types in SQL Server 2005 Books Online.

When Unicode data must be inserted into non-Unicode columns, the columns are internally converted from Unicode by using the WideCharToMultiByte API and the code page associated with the collation. If a character cannot be represented on the given code page, the character is replaced by a question mark (?). Therefore, the appearance of random question marks within your data is a good indication that your data has been corrupted due to unspecified conversion. It also is a good indication that your application could benefit from conversion to a Unicode data type.

If you use a string literal of a non-Unicode type that is not supported by the collation, the string is converted first using the database's default code page, which is derived from the default collation of the database.

Another problem you might encounter is the inability to store data when not all of the characters you wish to support are contained in the code page. In many cases, Windows considers a particular code page to be a "best fit" code page, which means there is no guarantee that you can rely on the code page to handle all text; it is merely the best one available. An example of this is the Arabic script: it supports a wide array of languages, including Baluchi, Berber, Farsi, Kashmiri, Kazakh, Kirghiz, Pashto, Sindhi, Uighur, Urdu, and more. All of these languages have additional characters beyond those in the Arabic language as defined in Windows code page 1256. If you attempt to store these extra characters in a non-Unicode column that has the Arabic collation, the characters are converted into question marks.