March 30, 2007 at 2:03 pm
Edit: It looks like this site doesn't store the greek alpha character either.
I have this stored in the database:
3ß,7a-Dihydroxy-5-androsten-7-one
When I query the row using query analyzer it comes out corrrectly. However, when I query the row using perl and ODBC it comes out as:
3ß,7a-Dihydroxy-5-androsten-7-one
The ß remains but the a is converted to an a.
Using:
Perl 5.8.x (supports utf-8)
SQL 2005 Standard 64x
Sql Native Client (odbc driver v. 2005.90.1399.00)
April 2, 2007 at 7:41 am
It appears that the lower case alpha character is only supported in Unicode. Look at this example (The SET statements have the correct alpha, just in case posting this converted them to 'a'):
DECLARE @chem varchar(200)
SET @chem = '3ß,7a-Dihydroxy-5-androsten-7-one'
PRINT @chem
DECLARE @nchem nvarchar(200)
SET @nchem = N'3ß,7a-Dihydroxy-5-androsten-7-one'
PRINT @nchem
April 2, 2007 at 10:03 am
The data is stored as unicode (nvarchar).
It's listed correctly in the db. It's just that after it goes through the odbc driver to perl, it is losing the encoding.
Is there something I have to set in the odbc driver?
April 2, 2007 at 11:14 am
Sorry, I don't use Perl, so I can't really help with that. I assume you've already searched for help on this, but here are some that seemed helpful.
http://perl.active-venture.com/pod/perlguts-unicode.html
http://www.ahinea.com/en/tech/perl-unicode-struggle.html
http://userpage.fu-berlin.de/~ram/pub/pub_jf47ht12Ht/perl_unicode_en
April 23, 2007 at 2:18 pm
I don't know that it is a perl problem. From perl I can print UTF-8 characters through the web server just fine.
The characters are correct in the table, I can query them just fine with a query from query analyzer.
The problem is when I read them through an ODBC connection I lose the characters. Is there something I have to set up in the DSN or connect string to force UTF-8?
Here is an image of the problem:
April 24, 2007 at 9:31 am
Windows and SQL Server use UCS-2 unicode, which is a fixed-length 2-byte format. UTF-8 uses 8 bits for some characters, and 16 bits for others. Maybe the characters you are referring to are getting "lost in translation". Queries may work because the characters are converted internally when the query is parsed.
April 24, 2007 at 9:50 am
SQL Server 2000 uses UCS-2. We specifically upgraded to SQL 2005 because it does support UTF-8.
The querys did not work with SQL Server 2000 because of the UCS-2, but they do work with SQL Server 2005.
I have updated to the ODBC driver that comes with SQL Server 2005, but that's were the problem seems to happening.
Thanks for your help so far.
April 28, 2007 at 4:56 pm
Some minor explanation needed SQL Server 2005 can natively support UTF8 because of the .NET CLR in SQL Server 2005 but 2000 also accepts UTF8 because although the .NET FCL(framework class library) is UTF16 by default you can change that in Visual Studio. I am not using Vista but it maybe UTF16 instead of UCS-2 because .NET from definition in the ealry 2000 is UTF16 and not UCS-2.
Now to your problem I think you need to set Greek collation in SQL Server 2005 to generate UTF8 Unicode you should know there is no equivalent of the .NET Char in SQL Server because .NET Char is the ninth integer and UTF16 by default so until you use Nvarchar above 200 you are still using bytes. You can save your Perl code as Unicode in Notepad, the chart in the link below was created by the SQL Server team very interesting read. Hope this helps.
http://msdn2.microsoft.com/en-us/library/ms131092.aspx
Kind regards,
Gift Peddie
October 2, 2007 at 3:23 pm
Apparently, using DBI and ADO is the only working way to do this right now.
I was able to get it working.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply