UTF-8 Problems with ODBC

Question

UTF-8 Problems with ODBC

nathand

Right there with Babe

Points: 786
More actions
March 30, 2007 at 2:03 pm

#370108

Edit: It looks like this site doesn't store the greek alpha character either.
I have this stored in the database:
3ß,7a-Dihydroxy-5-androsten-7-one
When I query the row using query analyzer it comes out corrrectly. However, when I query the row using perl and ODBC it comes out as:
3ß,7a-Dihydroxy-5-androsten-7-one
The ß remains but the a is converted to an a.
Using:
Perl 5.8.x (supports utf-8)
SQL 2005 Standard 64x
Sql Native Client (odbc driver v. 2005.90.1399.00)

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

vadba SSChampion Points: 11133 More actions · Answer 1

It appears that the lower case alpha character is only supported in Unicode. Look at this example (The SET statements have the correct alpha, just in case posting this converted them to 'a'):

DECLARE @chem varchar(200)

SET @chem = '3ß,7a-Dihydroxy-5-androsten-7-one'

PRINT @chem

DECLARE @nchem nvarchar(200)

SET @nchem = N'3ß,7a-Dihydroxy-5-androsten-7-one'

PRINT @nchem

nathand Right there with Babe Points: 786 More actions · Answer 2

The data is stored as unicode (nvarchar).

It's listed correctly in the db. It's just that after it goes through the odbc driver to perl, it is losing the encoding.

Is there something I have to set in the odbc driver?

vadba SSChampion Points: 11133 More actions · Answer 3

Sorry, I don't use Perl, so I can't really help with that. I assume you've already searched for help on this, but here are some that seemed helpful.

http://perl.active-venture.com/pod/perlguts-unicode.html

http://www.ahinea.com/en/tech/perl-unicode-struggle.html

http://userpage.fu-berlin.de/~ram/pub/pub_jf47ht12Ht/perl_unicode_en

nathand Right there with Babe Points: 786 More actions · Answer 4

I don't know that it is a perl problem. From perl I can print UTF-8 characters through the web server just fine.

The characters are correct in the table, I can query them just fine with a query from query analyzer.

The problem is when I read them through an ODBC connection I lose the characters. Is there something I have to set up in the DSN or connect string to force UTF-8?

Here is an image of the problem:

http://web.mitsi.com/cis.jpg

vadba SSChampion Points: 11133 More actions · Answer 5

Windows and SQL Server use UCS-2 unicode, which is a fixed-length 2-byte format. UTF-8 uses 8 bits for some characters, and 16 bits for others. Maybe the characters you are referring to are getting "lost in translation". Queries may work because the characters are converted internally when the query is parsed.

nathand Right there with Babe Points: 786 More actions · Answer 6

SQL Server 2000 uses UCS-2. We specifically upgraded to SQL 2005 because it does support UTF-8.

The querys did not work with SQL Server 2000 because of the UCS-2, but they do work with SQL Server 2005.

I have updated to the ODBC driver that comes with SQL Server 2005, but that's were the problem seems to happening.

Thanks for your help so far.

Gift Peddie SSC Guru Points: 73570 More actions · Answer 7

Some minor explanation needed SQL Server 2005 can natively support UTF8 because of the .NET CLR in SQL Server 2005 but 2000 also accepts UTF8 because although the .NET FCL(framework class library) is UTF16 by default you can change that in Visual Studio. I am not using Vista but it maybe UTF16 instead of UCS-2 because .NET from definition in the ealry 2000 is UTF16 and not UCS-2.

Now to your problem I think you need to set Greek collation in SQL Server 2005 to generate UTF8 Unicode you should know there is no equivalent of the .NET Char in SQL Server because .NET Char is the ninth integer and UTF16 by default so until you use Nvarchar above 200 you are still using bytes. You can save your Perl code as Unicode in Notepad, the chart in the link below was created by the SQL Server team very interesting read. Hope this helps.

http://msdn2.microsoft.com/en-us/library/ms131092.aspx

Kind regards,
Gift Peddie

nathand Right there with Babe Points: 786 More actions · Answer 8

Apparently, using DBI and ADO is the only working way to do this right now.

I was able to get it working.