How to compare two strings with different encodings?

  • I am working on some data in two fields. One field is in unicode encoding, the other is in UTF-8.

    The text strings are identical, but when a comparison is done SQL shows them to be not equal due to the encoding difference. Is there a way to overcome the encoding difference to improve the comparison?

  • What is an example of where the same values are being seen as different? I'm running examples where the comparisons (nchar to char) always succeed.

    (Do you rather mean like where an 'é' is compared to an 'e'?)

  • I mean with just normal text.

    I could load the following text, "My cat is furry" into two different encoding, and SQL sees them as different.

  • Unlike Oracle, SQL Server does not support a UTF-8 collation (character set). Since UTF-8 is, in essence, a binary stream, it will have to be converted to the Unicode UCS-2 encoding that is used by SQL Server before it can be compared.

    See the following topic in BOL: http://technet.microsoft.com/en-us/library/ms179342(SQL.90).aspx

    If the data is XML, you may be able to use XML functions to perform the conversion. If it isn't XML, you may need to write a CLR function that can accept UTF-8 encoded data and return a Unicode string. See sample in BOL: http://technet.microsoft.com/en-us/library/ms160893(SQL.90).aspx


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply